WO2025081064A2 - Thermophilic deaminase and methods for identifying modified cytosine - Google Patents
Thermophilic deaminase and methods for identifying modified cytosine Download PDFInfo
- Publication number
- WO2025081064A2 WO2025081064A2 PCT/US2024/051081 US2024051081W WO2025081064A2 WO 2025081064 A2 WO2025081064 A2 WO 2025081064A2 US 2024051081 W US2024051081 W US 2024051081W WO 2025081064 A2 WO2025081064 A2 WO 2025081064A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- deaminase
- engineered
- ssdna
- dna
- nucleic acid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/78—Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y305/00—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
- C12Y305/04—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
Definitions
- the present disclosure provides engineered deaminases that bind to and deaminate single-stranded DNA (ssDNA) at a temperature that destabilizes ssDNA secondary structure.
- the engineered deaminases comprise one or more modifications relative to a wild-type enzyme.
- the wild-type enzyme is a tRNA deaminase.
- FIG. 2B is a schematic representation of the NGS assay.
- the ssDNA oligo substrate contains 17 unmethylated cytosines (C) and 16 methylated cytosines (mC).
- C unmethylated cytosines
- mC methylated cytosines
- the present disclosure provides engineered deaminases that bind to and deaminate singlestranded DNA (ssDNA) at a temperature that destabilizes ssDNA secondary structure.
- Compositions and kits comprising the engineered deaminases, polynucleotides encoding the engineered deaminases, and methods of using the engineered deaminases to detect the locations of modified cytosine (modC) in nucleic acids are also provided.
- the inventors also generated variants of met7 with increased deaminase activity by introducing substitution mutations into various active site-adjacent residues of met7.
- the one or modifications comprise a substitution mutation.
- Active site-adjacent residues are amino acid residues that are close to the active site of an enzyme. Examples of active site-adjacent residues found in met7 include, without limitation, P20, R22, T23, A40, D41, D42, E43, T63, A64, P66, G86, R87, G88, R89, G90.
- TET2-based methods can also be used to distinguish between modified and unmodified C.
- TET2 is used to oxidize modified C (mC and hmC), which protects them from cytidine deamination.
- a cytidine deaminase is used to convert unprotected unmodified C to U, and the positions of modified C are then inferred based on the lack of a transversion mutation.
- the inventors demonstrate that met7 has increased ssDNA deaminase activity relative to CDAT8.
- the engineered deaminase has increased ssDNA deaminase activity relative to the wild-type enzyme.
- the ssDNA deaminase activity of the engineered deaminase is at least 25% greater, at least 50% greater, at least 100% (2- fold) greater, at least ten times greater, at least 20-times greater, or at least 100-times greater than the ssDNA deaminase activity of the wild-type enzyme.
- ssDNA deaminase activity can be assessed using a variety of in vitro assays, including the USER cleavage assay, NGS assay, and Swal assay described in Example 1.
- a pair-wise comparison analysis of amino acid sequences can be conducted, for instance, by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci.
- the polynucleotide may encode either a mature protein or a protein having a pro-sequence, including that encoding a leader sequence on the preprotein which is then cleaved by the host cell to form a mature protein.
- the vectors may be, for example, plasmid, virus, or phage vectors provided with an origin of replication, and optionally a promoter for the expression of said nucleotide and optionally a regulator of the promoter.
- the vectors may contain one or more selectable markers, such as, for example, an antibiotic resistance gene.
- the vectors optionally include generic expression cassettes containing at least one independent terminator sequence, sequences permitting replication of the cassette in eukaryotes, or prokaryotes, or both, (e.g., shuttle vectors) and selection markers for both prokaryotic and eukaryotic systems.
- Vectors are suitable for replication and integration in prokaryotes, eukaryotes, or both.
- C to T point mutations can be easily identified, and these point mutations are inferred as 5mC positions.
- an engineered deaminase having 5hmC-defective deaminase activity i.e., preferentially deaminates C and 5mC to U and T, respectively, and has significantly reduced deamination of 5hmC
- the absence of C to T point mutations can be easily identified, and the absence point mutations are inferred as 5hmC positions.
- primers used for an amplification bind with greater affinity to a nucleic acid that includes T nucleotides where 5mC nucleotides were present prior to treatment.
- the annealing of a primer to a predetermined sequence that includes the expected 5mC to T conversion(s) allows one to infer the location of a modified cytosine in the untreated target nucleic acid.
- a primer that binds with greater affinity to a nucleic acid that includes T nucleotides where 5mC nucleotides were present prior to treatment can include at least 1, at least 2, at least 3, at least 4 or at least 5 nucleotides that will base-pair with a nucleotide that results from conversion of 5mC to T, i.e., an adenine (A), and when amplification is used, then a second primer for the reverse strand that has a T instead of guanine (G).
- A adenine
- G guanine
- the use of a polymerase that disfavors uracil can aid in reducing the amplification of treated target nucleic acids that include spurious C to U conversion that may result from use of an engineered deaminase.
- B- family polymerases are known to exhibit “uracil read-ahead” function which causes stalling of the polymerase at uracil residues (Greagg et al., 1999, PNAS USA,' 96(16):9045-50).
- an engineered deaminase can function in essentially any buffer.
- useful buffers include, but are not limited to: a citrate buffer, such as the citrate buffer available from Thermo Fisher Scientific (Cat. No. #005000); sodium acetate buffer, Bis TrisPropane HC1; and Tris-HCl Tris.
- other buffers include, but are not limited to, Bicine, DIPSO, glycylglycine, HEPES, imidazole, malonate, MES, MOPS, PB, phosphate, PIPES, SPG, succinate, TAPS, TAPSO, trincine.
- a reducing agent such as dithiothreitol (DTT) can be present.
- a deamination reaction can include an engineered deaminase at a concentration from at least 0.5 micromolar (pM) to no greater than 5 pM.
- concentration of the enzyme can be at least 0.5, at least 1 pM, at least 2 pM, at least 3 pM, at least 4 pM, or 5 pM, and/or no greater than 5 pM, no greater than 4 pM, no greater than 3 pM, no greater than 2 pM, no greater than 1 pM, or 0.5 pM.
- a deamination reaction can include nucleic acids at a concentration of at least 400 nanomolar (nM) to no greater than 2 pM.
- a deamination reaction can include an RNAse.
- RNase A has been implicated in increasing activity of cytidine deaminases (Bransteitter et al., Proceedings of the National Academy of Sciences of the United States of America 100, no. 7 (2003): 4102-7. doi.org/10.1073/pnas.0730835100).
- activity of an engineered deaminase of the present disclosure was determined in the presence of RNAse A the opposite was observed.
- the concentration of RNAse A can be at least 1 ug/ml , at least 2 ug/ml, at least 3 ug/ml, at least 4 ug/ml, 5 ug/ml, 6 ug/ml, 7 ug/ml, 8 ug/ml, or 9 ug/ml, and/or no greater than 50 ug/ml, no greater than 40 ug/ml, no greater than 30 ug/ml, no greater than 20 ug/ml, no greater than 19 ug/ml, no greater than 18 ug/ml, no greater than 17 ug/ml, no greater than 16 ug/ml, no greater than 15 ug/ml, no greater than 14 ug/ml, no greater than 13 ug/ml, no greater than 12 ug/ml, or no greater than 11 ug/ml.
- the concentration of RNAse A is
- target nucleic acids contacted with an engineered deaminase and used in the methods, compositions, and kits provided herein may be essentially any nucleic acid of known or unknown sequence. Sequencing may result in determination of the sequence of the whole or a part of the target molecule.
- target nucleic acids can be processed into templates suitable for amplification by the placement of universal amplification sequences, e.g., sequences present in a universal adaptor, at the ends of each target fragment.
- the primary nucleic acid molecules may represent the entire transcriptome of cells of an organism, e.g., mRNA molecules.
- the primary nucleic acid molecules may represent the entire transcriptome of specific cells of an organism, e.g., from tumor cells or for instance the cells of a tissue.
- the primary nucleic acid molecules may represent a particular subset of mRNA, e.g., mRNA having a specific sequence that anneals with a primer such as one used for targeted sequencing or target enrichment.
- a biological sample includes tissue that is processed to obtain the desired primary nucleic acids.
- cells are used obtain the desired primary nucleic acids.
- nuclei are used to obtain the desired primary nucleic acids.
- the method can further include dissociating cells, and/or isolating nuclei from cells. Methods for isolating cells and nuclei from tissue are available (WO 2019/236599).
- fixation examples include, but are not limited to, whole genome sequencing of isolated nuclei and chromosome conformation capture methods such as Hi ⁇ C. Common methods of fixation include perfusion, immersion, freezing, and drying (Srinivasan et al., Am J Pathol. 2002 Dec; 161(6): 1961-1971. doi: 10.1016/S0002-9440(10)64472-0). In some embodiments such as whole genome sequencing, isolated nuclei can be processed to dissociate nucleosomes from DNA while leaving the nuclei intact, and methods for generating nucleosome-free nuclei are available (WO 2018/018008).
- fragmentation can be accomplished using a process often referred to as tagmentation.
- Tagmentation uses a transposome complex and combines into a single step fragmentation and ligation to add universal adapters (WO 2016/130704).
- a transposome complex is a transposase bound to a transposase recognition site and can insert the transposase recognition site into a target nucleic acid in a process sometimes termed "tagmentation. " In some such insertion events, one strand of the transposase recognition site may be transferred into the target nucleic acid.
- a target nucleic acid used in a method, composition, or kit described herein can include a universal adapter attached to each end.
- a target nucleic acid having a universal adapter at each end can be referred to as a "modified target nucleic acid.”
- Methods for attaching a universal adapter to each end of a target nucleic acid used in a method described herein are known to the person skilled in the art.
- the attachment can be through tagmentation using transposase complexes (WO 2016/130704), or through standard library preparation techniques using ligation (U.S. Pat. Pub. No. 2018/0305753). Attachment of a universal adapter to the ends of a target nucleic acid can occur before or after treatment of the target nucleic acid with an engineered deaminase.
- the universal adapters used in the method of the disclosure are referred to as "mismatched" adaptors because the adaptors include a region of sequence mismatch, i.e., they are not formed by annealing fully complementary polynucleotide strands.
- mismatched adaptors are further described in Gormley et al., U.S. Pat. No. 7,741,463, and Bignell et al., U.S. Pat. No. 8,053,192,).
- the universal adaptor typically includes universal capture binding sequences that aid in immobilizing the target nucleic acids on an array for subsequent sequencing, and universal primer binding sites useful for the sequencing.
- a universal adapter can optionally include at least one index.
- An index can be used as a marker characteristic of the source of particular target nucleic acids on a flow cell (U.S. Pat. No. 8,053,192).
- the index is a synthetic sequence of nucleotides that is part of the universal adapter which is added to the target nucleic acids as part of the library preparation step.
- an index is a nucleic acid sequence which is attached to each of the target molecules of a particular sample, the presence of which is indicative of, or is used to identify, the sample or source from which the target molecules were isolated.
- single- stranded deaminase-treated DNA is prepared for sequencing using a single-stranded library preparation method, as is known in the art. Such methods include, but are not limited to, template switching based second strand synthesis, adapters containing a single-stranded splint overhang, and the like. Reagents for performing single-stranded library preparation methods are commercially available.
- SRSLY singlereaction single-stranded library
- library preparation modifications are made to double-stranded target DNA prior to treatment with engineered deaminase.
- Methods for library preparation of double-stranded DNA template are known in the art, and include Y-adaptor ligation, transposome- based tagmentation, and the like. It will be appreciated by those of skill in the art that methods of double-strand library preparation often include one or more amplification steps using for example, PCR. In such methods, the amplification step may be deferred until after engineered deaminase treatment, to preserve the methylation status of the template strand.
- methods for amplifying immobilized target nucleic acids include, but are not limited to, bridge amplification and exclusion amplification (also referred to as kinetic exclusion amplification (KEA).
- bridge amplification and exclusion amplification also referred to as kinetic exclusion amplification (KEA).
- KAA kinetic exclusion amplification
- a pooled sample can be immobilized in preparation for sequencing. Sequencing can be performed as an array of single molecules or can be amplified prior to sequencing. The amplification can be carried out using one or more immobilized primers.
- the immobilized primer(s) can be, for instance, a lawn on a planar surface, or on a pool of beads.
- the pool of beads can be isolated into an emulsion with a single bead in each "compartment" of the emulsion. At a concentration of only one template per "compartment," only a single template is amplified on each bead.
- the features in a patterned surface can be wells in an array of wells (e.g., microwells or nanowells) on glass, silicon, plastic or other suitable solid supports with patterned, covalently- linked gel such as poly(N-(5-azidoacetamidylpentyl)acrylamide-co-acrylamide) (PAZAM, see, for example, US Pub. No. 2013/184796, WO 2016/066586, and WO 2015/002813).
- PAZAM poly(N-(5-azidoacetamidylpentyl)acrylamide-co-acrylamide)
- the process creates gel pads used for sequencing that can be stable over sequencing runs with a large number of cycles.
- the covalent linking of the polymer to the wells is helpful for maintaining the gel in the structured features throughout the lifetime of the structured substrate during a variety of uses.
- Certain embodiments of the disclosure may make use of solid supports that include an inert substrate or matrix (e.g., glass slides, polymer beads, etc.) which has been "functionalized," for example by application of a layer or coating of an intermediate material including reactive groups which permit covalent attachment to biomolecules, such as polynucleotides.
- supports include, but are not limited to, polyacrylamide hydrogels supported on an inert substrate such as glass.
- the biomolecules e.g., polynucleotides
- the intermediate material e.g., the hydrogel
- the intermediate material may itself be non-covalently attached to the substrate or matrix (e.g., the glass substrate).
- covalent attachment to a solid support is to be interpreted accordingly as encompassing this type of arrangement.
- cluster and “colony” are used interchangeably herein to refer to a discrete site on a solid support including a plurality of identical immobilized nucleic acid strands and a plurality of identical immobilized complementary nucleic acid strands.
- the term “clustered array” refers to an array formed from such clusters or colonies. In this context, the term “array” is not to be understood as requiring an ordered arrangement of clusters.
- the amplification method may include ligation probe amplification or oligonucleotide ligation assay (OLA) reactions that contain primers directed specifically to the nucleic acid of interest.
- the amplification method may include a primer extension-ligation reaction that contains primers directed specifically to the nucleic acid of interest.
- primer extension and ligation primers that may be specifically designed to amplify a nucleic acid of interest, the amplification may include primers used for the GoldenGate assay (Illumina, Inc., San Diego, CA) as exemplified by U. S. Pat. No. 7,582,420 and 7,611,869.
- Exemplary isothermal amplification methods that may be used in a method of the present disclosure include, but are not limited to, Multiple Displacement Amplification (MDA) as exemplified by, for example Dean et al., Proc. Natl. Acad. Sci. USA 99:5261-66 (2002) or isothermal strand displacement nucleic acid amplification exemplified by, for example U.S. Pat. No. 6,214,587.
- Other non-PCR-based methods that may be used in the present disclosure include, for example, strand displacement amplification (SDA) which is described in, for example Walker et al., Molecular Methods for Virus Detection, Academic Press, Inc., 1995; U.S. Pat. Nos.
- Kinetic exclusion can exploit a relatively slow rate for initiating amplification (e.g., a slow rate of making a first copy of a modified target nucleic acids) vs. a relatively rapid rate for making subsequent copies of the modified target nucleic acids (or of the first copy of the modified target nucleic acids).
- kinetic exclusion occurs due to the relatively slow rate of modified target nucleic acids seeding (e.g., relatively slow diffusion or transport) vs. the relatively rapid rate at which amplification occurs to fill the site with copies of the modified target nucleic acid seed.
- recombinase-facilitated amplification can be carried out isothermally. It is generally desirable to include ATP, or other nucleotides (or in some cases non-hydrolyzable analogs thereof) in a recombinase-facilitated amplification reagent to facilitate amplification.
- a mixture of recombinase and single-stranded binding (SSB) protein is particularly useful as SSB can further facilitate amplification.
- Exemplary formulations for recombinase-facilitated amplification include those sold commercially as TwistAmp kits by TwistDx (Cambridge, UK). Useful components of recombinase-facilitated amplification reagent and reaction conditions are set forth in US 5,223,414 and US 7,399,590.
- nucleic acid sequencing techniques can be used in conjunction with a variety of nucleic acid sequencing techniques. Particularly applicable techniques are those wherein nucleic acids are attached at fixed locations in an array such that their relative positions do not change and wherein the array is repeatedly imaged. Embodiments in which images are obtained in different color channels, for example, coinciding with different labels used to distinguish one nucleotide base type from another are particularly applicable.
- the process to determine the nucleotide sequence of a modified target nucleic acid can be an automated process. Preferred embodiments include sequencing-by-synthesis ("SBS”) techniques.
- SBS sequencing-by-synthesis
- a nucleotide monomer includes locked nucleic acids (LNAs) or bridged nucleic acids (BNAs).
- LNAs locked nucleic acids
- BNAs bridged nucleic acids
- the terminator can be effectively irreversible under the sequencing conditions used as is the case for traditional Sanger sequencing which utilizes dideoxynucleotides, or the terminator can be reversible as is the case for sequencing methods developed by Solexa (now Illumina, Inc ).
- Preferred embodiments include pyrosequencing techniques. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as particular nucleotides are incorporated into the nascent strand (Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlen, M. and Nyren, P. (1996) "Realtime DNA sequencing using detection of pyrophosphate release.” Analytical Biochemistry 242(1), 84-9; Ronaghi, M. (2001) "Pyrosequencing sheds light on DNA sequencing.” Genome Res. 11(1), 3-11; Ronaghi, M., Uhlen, M. and Nyren, P.
- PPi inorganic pyrophosphate
- PPi adenosine triphosphate
- ATP adenosine triphosphate
- the nucleic acids to be sequenced can be attached to features in an array and the array can be imaged to capture the chemiluminescent signals that are produced due to incorporation of a nucleotides at the features of the array.
- An image can be obtained after the array is treated with a particular nucleotide type (e.g., A, T, C or G). Images obtained after addition of each nucleotide type will differ with regard to which features in the array are detected. These differences in the image reflect the different sequence content of the features on the array. However, the relative locations of each feature will remain unchanged in the images.
- the images can be stored, processed and analyzed using the methods set forth herein. For example, images obtained after treatment of the array with each different nucleotide type can be handled in the same way as exemplified herein for images obtained from different detection channels for reversible terminator-based sequencing methods.
- nucleotide types can be detected under particular conditions while a fourth nucleotide type lacks a label that is detectable under those conditions, or is minimally detected under those conditions (e.g., minimal detection due to background fluorescence, etc.). Incorporation of the first three nucleotide types into a nucleic acid can be determined based on presence of their respective signals and incorporation of the fourth nucleotide type into the nucleic acid can be determined based on absence or minimal detection of any signal.
- one nucleotide type can include label(s) that are detected in two different channels, whereas other nucleotide types are detected In no more than one of the channels.
- Some embodiments can use sequencing by ligation techniques. Such techniques use DNA ligase to incorporate oligonucleotides and identify the incorporation of such oligonucleotides.
- the oligonucleotides typically have different labels that are correlated with the identity of a particular nucleotide in a sequence to which the oligonucleotides hybridize.
- images can be obtained following treatment of an array of nucleic acid features with the labeled sequencing reagents. Each image will show nucleic acid features that have incorporated labels of a particular type. Different features will be present or absent in the different images due the different sequence content of each feature, but the relative position of the features will remain unchanged in the images.
- a single-molecule nanopore device detects DNA polymerase activity with single-nucleotide resolution. J. Am. Chem. Soc. 130, 818-820 (2008)). Data obtained from nanopore sequencing can be stored, processed and analyzed as set forth herein. In particular, the data can be treated as an image in accordance with the exemplary treatment of optical images and other images that is set forth herein.
- Some embodiments can use methods involving the real-time monitoring of DNA polymerase activity.
- Nucleotide incorporations can be detected through fluorescence resonance energy transfer (FRET) interactions between a fluorophore-b earing polymerase and y-phosphate- labeled nucleotides as described, for example, in U.S. Pat. Nos. 7,329,492 and 7,211,414, or nucleotide incorporations can be detected with zero-mode waveguides as described, for example, in U.S. Pat. No. 7,315,019, and using fluorescent nucleotide analogs and engineered polymerases as described, for example, in U.S. Pat. No. 7,405,281 and U.S. Pub. No. 2008/0108082.
- FRET fluorescence resonance energy transfer
- the converted single- stranded DNA can then be processed as needed to facilitate hybridization to a microarray.
- the converted DNA can be amplified. Any one of a number of amplification methods as are known in the art can be performed. For example, wholegenome amplification or amplification using universal primers that hybridize to a common region in the converted DNA, such as an adaptor sequence, can be used. Additionally or alternatively, the converted DNA can be fragmented. Fragmentation can be performed prior to or following amplification, or in the absence of amplification. Any one of a number of fragmentation methods as are known in the art can be performed.
- the terms "organism,” “subject,” are used interchangeably and refer to microbes (e.g., prokaryotic or eukaryotic) animals and plants.
- An example of an animal is a mammal, such as a human.
- the term "target nucleic acid,” is intended as a semantic identifier for the nucleic acid in the context of a method or composition or kit set forth herein and does not necessarily limit the structure or function of the nucleic acid beyond what is otherwise explicitly indicated.
- Reference to a nucleic acid such as a target nucleic acid includes both single-stranded and double-stranded nucleic acids, and both DNA and RNA, unless indicated otherwise.
- the term library refers to the collection of target nucleic acids containing known common sequences, such as a universal sequence or adapter, at their 3' and 5' ends.
- An amplicon can be a nucleic acid molecule having a single copy of a particular nucleotide sequence (e.g., a PCR product) or multiple copies of the nucleotide sequence (e.g., a concatameric product of RCA).
- a first amplicon of a target nucleic acid is typically a complementary copy.
- Subsequent amplicons are copies that are created, after generation of the first amplicon, from the target nucleic acid or from the first amplicon.
- a subsequent amplicon can have a sequence that is substantially complementary to the target nucleic acid or substantially identical to the target nucleic acid.
- PCR polymerase chain reaction
- the method is referred to as PCR.
- the desired amplified segments of the polynucleotide of interest become the predominant nucleic acid sequences (in terms of concentration) in the mixture, they are said to be "PCR amplified”.
- the target nucleic acid molecules can be PCR amplified using a plurality of different primer pairs, in some cases, one or more primer pairs per target nucleic acid molecule of interest, thereby forming a multiplex PCR reaction.
- multiplex amplification refers to selective and non-random amplification of two or more target sequences within a sample using at least one target-specific primer. In some embodiments, multiplex amplification is performed such that some or all of the target sequences are amplified within a single reaction vessel.
- the "plexy” or “plex” of a given multiplex amplification refers generally to the number of different target-specific sequences that are amplified during that single multiplex amplification. In some embodiments, the plexy can be about 12-plex, 24-plex, 48-plex, 96-plex, 192-plex, 384-plex, 768-plex, 1536-plex, 3072-plex, 6144-plex or higher.
- an array As used herein, the term "array,” “analyte array,” and “microarray” are used interchangeably and refer to a population of sites that can be differentiated from each other according to relative location. Different molecules that are at different sites of an array can be differentiated from each other according to the locations of the sites in the array.
- An individual site of an array can include one or more molecules of a particular type. For example, a site can include a single target nucleic acid molecule having a particular sequence or a site can include several nucleic acid molecules having the same sequence (and/or complementary sequence, thereof). The sites of an array can be different features located on the same substrate.
- Exemplary features include without limitation, droplets, wells in a substrate, beads (or other particles) in or on a substrate, projections from a substrate, ridges on a substrate or channels in a substrate.
- the sites of an array can be separate substrates each bearing a different molecule. Different molecules attached to separate substrates can be identified according to the locations of the substrates on a surface to which the substrates are associated or according to the locations of the substrates in a liquid or gel.
- Exemplary arrays in which separate substrates are located on a surface include, without limitation, those having beads in wells.
- compartment is intended to mean an area or volume that separates or isolates something from other things.
- exemplary compartments include, but are not limited to, vials, tubes, wells, droplets, boluses, beads, vessels, surface features, flow cell, or areas or volumes separated by physical forces such as fluid flow, magnetism, electrical current or the like.
- a compartment is a well of a multi-well plate, such as a 96- or 384-well plate.
- a droplet may include a hydrogel bead, which is a bead for encapsulating one or more nuclei or cell, and includes a hydrogel composition.
- flow cell refers to a chamber comprising a solid surface across which one or more fluid reagents can be flowed.
- flow cells and related fluidic systems and detection platforms that can be readily used in the methods of the present disclosure are described, for example, in Bentley et al., Nature 456:53-59 (2008), WO 04/018497; US 7,057,026; WO 91/06678; WO 07/123744; US 7,329,492; US 7,211,414; US 7,315,019; US 7,405,281, and US 2008/0108082.
- DNA sequences While polynucleotide sequences encoding an engineered deaminase are described herein as DNA sequences, it is understood that the complements, reverse sequences, and reverse complements of the DNA sequences can be easily determined by the skilled person. It is also understood that the sequences described herein as DNA sequences can be converted from a DNA sequence to an RNA sequence by replacing each thymidine nucleotide with a uracil nucleotide.
- Example 1 In the following example, the inventors describe four experiments that demonstrate that a modified tRNA deaminase possesses single-stranded DNA (ssDNA) cytidine deaminase activity. Experiment 1 - USER cleavage assay
- a ssDNA substrate containing 17 unmethylated cytosine (C) and 16 methylated cytosine (mC) (FIG. 2C) was incubated with met7 at 70°C for 16 hours. The oligo was then PCR-amplified, indexed, and sequenced on an Illumina MiniSeq (FIG. 2B). C and mC deamination events were quantified as C to T mutations at each site.
- a catalytically inactive met7 mutant (E46A) was used as a negative control. met7 showed significantly higher deamination across all sites in the ssDNA oligo as compared to the E46A mutant (FIG. 2D), demonstrating that met7 is a bona fide ssDNA cytidine deaminase.
- Met7 residues that are influential for its deaminase activity i.e., C67A, C70A, and E46A were identified via comparison of structural models of mef7 and AP0BEC3A (FIG. 3C), and met7 variants comprising mutations in these residues were generated for use as catalytically inactive negative control enzymes.
- cell pellets were resuspended with 1 mL lysis buffer (50 mM Tris pH 7.5, 500 mM NaCl, 5% (v/v) glycerol, and 1 mM DTT, 10 mM imidazole, IX GoldBio Protease Inhibitor Cocktail) and cells were disrupted via sonication on ice (5 seconds on at 90% amplitude, 30 seconds off, total sonication time 90 seconds). Then, the lysate was clarified by centrifugation and filtration.
- 1 lysis buffer 50 mM Tris pH 7.5, 500 mM NaCl, 5% (v/v) glycerol, and 1 mM DTT, 10 mM imidazole, IX GoldBio Protease Inhibitor Cocktail
- Example 1 The experiments in Example 1 demonstrate that a modified tRNA deaminase possesses ssDNA cytidine deaminase activity.
- the inventors identify additional modifications to the tRNA deaminase that further increase this enzyme’s deaminase activity.
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Medicinal Chemistry (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Enzymes And Modification Thereof (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present disclosure provides engineered deaminases that bind to single-stranded DNA (ssDNA) and deaminate ssDNA at a temperature that destabilizes ssDNA secondary structure. Compositions comprising the engineered deaminases, polynucleotides encoding the engineered deaminases, and methods of using the engineered deaminases to detect the locations of modified cytosines in nucleic acids are also provided.
Description
THERMOPHILIC DEAMINASE AND METHODS FOR IDENTIFYING MODIFIED CYTOSINE
CROSS-REFERENCE TO RELATED APPLICATIONS
This patent application claims priority to U.S. Provisional Application No. 63/589,607 filed on October 11, 2023, the contents of which are incorporated by reference in their entireties.
SEQUENCE LISTING
This application includes a sequence listing in XML format titled “144194_00019_ST26.xml”, which is 79,847 bytes in size and was created on October 11, 2024. The sequence listing is electronically submitted with this application via Patent Center and is incorporated herein by reference in its entirety.
BACKGROUND
Cytidine deaminases are integral to nucleotide metabolism and antiviral defense. Singlestranded DNA (ssDNA) cytidine deaminases, such as the APOBEC family of enzymes, show varying activity on 5-methylcytosine (5mC) and its oxidized derivatives. The differential deamination of cytosine derivatives by APOBEC has been exploited to map 5mC positions in the genome. However, APOBEC-mediated enzymatic mapping of 5mC is ineffective in secondary structure-rich DNA, as the cytosine bases within such DNA are physically inaccessible. DNA secondary structures can be destabilized by increasing the reaction temperature. However, APOBEC enzymes are mesophilic and exhibit decreased activity above 37°C, rendering this strategy inviable. Accordingly, there is a remaining need in the art for methods of mapping 5mC in secondary structure-rich DNA.
SUMMARY
In a first aspect, the present disclosure provides engineered deaminases that bind to and deaminate single-stranded DNA (ssDNA) at a temperature that destabilizes ssDNA secondary structure. The engineered deaminases comprise one or more modifications relative to a wild-type enzyme. In some embodiments, the wild-type enzyme is a tRNA deaminase.
In a second aspect, the present disclosure provides compositions comprising an engineered deaminase described herein and a buffer.
In a third aspect, the present disclosure provides polynucleotides encoding an engineered deaminase described herein.
In a fourth aspect, the present disclosure provides vectors comprising a polynucleotide described herein.
In a fifth aspect, the present disclosure provides cells that express an engineered deaminase described herein.
In a sixth aspect, the present disclosure provides methods that comprise contacting a sample comprising ssDNA suspected of having at least one modified cytosine (modC) with an engineered deaminase described herein under conditions suitable for conversion of the modC by deamination. In preferred embodiments, the modified cysteine is 5-methylcytosine (5mC) or 5- hydroxymethylcytosine (5hmC).
In a seventh aspect, the present disclosure provides methods that comprise processing a sample of DNA suspected of comprising dsDNA comprising at least one modC to produce a sequencing library, denaturing the sequencing library to produce ssDNA, contacting the ssDNA with an engineered deaminase described herein under conditions suitable for conversion of the modC by deamination, and converting the converted ssDNA into a converted dsDNA sequencing library.
In an eighth aspect, the present disclosure provides methods of detecting the location of a modC in a target nucleic acid. These methods comprise contacting target nucleic acids suspected of comprising at least one modC with an engineered deaminase described herein to produce converted nucleic acids comprising at least one converted cytosine and detecting the at least one converted cytosine in the converted nucleic acids.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows structural models of the full-length CDAT8 protein (right) and itsN-terminal deaminase domain, i.e., met7 (left). The CDAT8 structure is adapted from Randau et al. (Science 324(5927):657-659, 2009).
FIG. 2A shows the results of the USER cleavage assay. In this assay, a 7,249-nucleotide circular ssDNA substrate was incubated with met7 at 70°C for 4 hours, treated with the enzyme USER at 37°C for 1 hour to cleave uracil -containing DNA, and resolved on a TBE-urea gel that was subsequently stained with SYBR Gold.
FIG. 2B is a schematic representation of the NGS assay. The ssDNA oligo substrate contains 17 unmethylated cytosines (C) and 16 methylated cytosines (mC). Deamination by met7
results in conversion of unmethylated C residues to uracil, and conversion of mC to thymine. These deamination events are both detected as C to T transversion mutations via DNA sequencing.
FIG. 2C shows the sequence of the ssDNA substrate used in the NGS assay (SEQ ID NO:
4). Methylated C is depicted as “/iMe-dC/.”
FIG. 2D shows results of performing the NGS assay with wild-type met7 (top) and a catalytically inactive met7 mutant (E46A) (bottom). The graphs on the left show the number and frequency of C (yellow) and mC (blue) deamination events. (Note: The green area represents the overlap between these data.) The graphs on the right show the percentage of deamination that occurs within various sequence motif contexts.
FIG. 3A is a schematic representation of the Swal assay. “XC” denotes the single cytosine base in a FAM-labelled ssDNA substrate. Deamination of the substrate by met7 results in the formation of a Swal restriction site. Swal can cleave the substrate following deamination and the subsequent addition of a complementary oligonucleotide. Cleaved and uncleaved substrate can then be visualized via 15% Urea-PAGE using a FAM fdter.
FIG. 3B shows the sequence of the ssDNA substrate used in the Swal assay (SEQ ID NO:
5). “XC” denotes the single cytosine base. “/36-FAM/” denotes a 3’ FAM label.
FIG. 3C shows structural models of AP0BEC3A (PDB: 5KEG) and met7 (PDB: 3G8Q), which were superimposed using PyMOL and have a root mean square deviation (RMSD) value of 4.35 A. The triad of residues that coordinate Zn(II) in AP0BEC3A, i.e., H70, C101 and C106, correspond to residues H44, C67, and C70 in met7, respectively. In addition, E46 of met7 lies proximal to the cytosine base and forms a hydrogen bonding interaction.
FIG. 3D shows results of performing the Swal assay with wild-type met7, three catalytically inactive met7 mutants (i.e., C67A, C70A, and C46A), and wild-type AP0BEC3A (A3 A) using a ssDNA substrate comprising a single cytosine (depicted in FIG. 3B). These assays were performed both following pre-incubation of the enzymes at 70°C (right) and without any preincubation (left). “Cut” denotes the cleaved ssDNA substrate that is indicative of cytosine deamination, and “uncut” denotes the full-length, uncleaved ssDNA substrate.
FIG. 4 shows the results of performing the Swal assay with wild-type AP0BEC3A (left), with wild-type met7 (middle), and without deaminase protein (right) using ssDNA substrates comprising a single unmethylated cytosine (C), methylated cytosine (mC), hydroxymethylcytosine
(hmC), formylcytosine (fC), or carboxyl cytosine (caC). These assays were performed at both 37°C (top) and 70°C (bottom).
FIG. 5 shows the results of performing the Swal assay at 80°C with wild-type met7 (right) or without deaminase protein (left) using ssDNA substrates comprising a single unmethylated cytosine (C oligo), methylated cytosine (5mC oligo), hydroxymethylcytosine (5hmC oligo), formylcytosine (5fC oligo), or carboxylcytosine (5caC oligo).
FIG. 6 shows structural models of AP0BEC3A (PDB: 5KEG) and met7 (PDB: 3G8Q), which were superimposed using PyMOL. Four loops within met7 that can potentially interact with the bound oligonucleotide substrate are highlighted in black. Surface charges are visualized as vacuum electrostatics generated by PyMOL (blue: basic residues; red: acidic residues).
FIG. 7 shows results of performing the NGS assay with met7 variants comprising various substitutions at residues A64 and G88. The graphs show the number and frequency of C (yellow) and mC (blue) deamination events. (Note: The green area represents the overlap between these data.)
FIG. 8 shows results of performing the NGS assay with met7 variants comprising various substitutions at residues D41, D42, and G90 using the ssDNA substrate depicted in FIG. 2C. Results were also generated with the wild-type met7 enzyme (WT) and a catalytically inactive met7 variant (E46A).
DETAILED DESCRIPTION
The present disclosure provides engineered deaminases that bind to and deaminate singlestranded DNA (ssDNA) at a temperature that destabilizes ssDNA secondary structure. Compositions and kits comprising the engineered deaminases, polynucleotides encoding the engineered deaminases, and methods of using the engineered deaminases to detect the locations of modified cytosine (modC) in nucleic acids are also provided.
In Example 1, the inventors demonstrate that a truncated tRNA deaminase comprising only the deaminase domain of the wild-type Methanopyrus kandleri enzyme CDAT8 exhibits ssDNA cytidine deaminase activity. This deaminase domain is referred to herein as met7. In Example 2, they demonstrate that additional modifications to met7 further increase this enzyme’s deaminase activity. Importantly, the inventors also demonstrate that met7 and variants thereof are active at temperatures that destabilize ssDNA secondary structures and dsDNA, e.g., around 60-80°C. As a result, these deaminases can be used to map modC within secondary structure-rich DNA. This
high-temperature stability also makes these deaminases amenable to lysate-based mutant screening, as contaminating nucleases in the lysate can be heat-inactivated without compromising their deaminase activity.
Engineered Deaminases
The present disclosure provides engineered deaminases that bind to and deaminate singlestranded DNA (ssDNA) at a temperature that destabilizes ssDNA secondary structure. The engineered deaminases comprise one or more modifications relative to a wild-type enzyme.
Deamination is the removal of an amino group from a molecule. A “deaminase” is an enzyme that catalyzes a deamination reaction. The engineered deaminases of the present invention deaminate the nucleoside cytidine and particular variants thereof, such as methylcytidine or hydroxymethylcytosine, in the context of ssDNA. For example, cytidine (C) may be converted to uridine (U), methyl cytidine (mC) may be converted to thymine (T), or hydroxymethylcytosine (hmC) to hydroxymethyluridine (hmU). Thus, when these engineered deaminases “deaminate ssDNA,” they are deaminating cytidine and variants thereof within ssDNA.
The deaminases of the present invention are “engineered,” meaning that they have been altered by the hand of man. Specifically, the engineered deaminases have been altered to comprise one or more modifications relative to a wild-type enzyme. As used herein, the term “wild-type enzyme” refers to the typical form of an enzyme that occurs in nature. In the Examples, the inventors generated an engineered deaminase (i.e., met7, SEQ ID NO: 2) by truncating the wildtype enzyme CDAT8 (SEQ ID NO: 2) to include just its deaminase domain. CDAT8 is a tRNA deaminase from the thermophilic archaeon Methanopyrus kandleri, which deaminates cytidine (C) to uridine (U) in position C8 of tRNAs in this organism (Randau et al., A cytidine deaminase edits C to U in transfer RNAs in Archaea. Science 324(5927):657-659, 2009). CDAT8 consists of a C- terminal THUMP domain that binds tRNA, a central ferredoxin-like domain (FLD), and an N- terminal deaminase domain (i.e., met7). The domain architecture of CDAT8 is shown in FIG. 1. Thus, in some embodiments, the wild-type enzyme is a tRNA deaminase. In specific embodiments, the wild-type enzyme is CDAT8 (SEQ ID NO: 1) or a homolog thereof. Suitable examples of CDAT8 homologs are disclosed as SEQ ID NOs: 6-68, as detailed in Table 1. Those of skill in the art can identify additional CDAT8 homologs using a publicly available database such as the Protein database available at the National Center for Biotechnology Information (ncbi.nlm.nih.gov/protein).
Table 1. Examples of CDAT8 homologs
The engineered deaminases of the present invention comprise one or more modifications relative to a wild-type enzyme. As used herein, the term “modification” simply refers to a difference relative to the wild-type enzyme. In some embodiments, the one or more modifications comprise a “mutation,” i.e., a difference in an amino acid sequence relative to a reference sequence. Mutations include insertions, deletions, and substitutions of an amino acid residue relative to a reference sequence. In other embodiments, the one or more modifications comprise a “supercharged surface modification,” i.e., a modification in which one or more amino acids on a protein surface are mutated to introduce additional positive charges to the surface. In other embodiments, the one or more modifications comprise fusion to a nucleic-acid binding domain. Examples of suitable nucleic-acid binding domains include, without limitation, the DNA binding domain of a DNA binding protein such as NeqSSBi, gp32, Rad51, SsbA, Sso7d, or MazE, or the RNA binding domain of an RNA binding protein such PUF, Tat, or B2. In other embodiments, the one or more modifications comprise fusion to a guide nucleic acid to target it a particular target DNA sequence.
As is noted above, “met7” is the isolated deaminase domain of the tRNA deaminase CDAT8. met7 was generated via truncation of the wild-type enzyme CDAT8 to include just the deaminase domain. Thus, in some embodiments, the one or modifications comprise a truncation. The amino acid sequence of met7 is provided as SEQ ID NO: 2. However, the amino acid sequence of wild-type met7 protein tested in the Examples (i.e., SEQ ID NO: 3) further comprises a 6x-His tag for purification via Ni-NTA affinity chromatography. Surprisingly, met7 binds to DNA without any further modification. Thus, in some embodiments, the engineered deaminase
comprises or consists essentially ofmet7 (SEQ ID NO: 2 or SEQ ID NO: 3). In other embodiments, the engineered deaminase consists essentially of the deaminase domain of a CDAT8 homolog.
In the Examples, the inventors also generated variants of met7 with increased deaminase activity by introducing substitution mutations into various active site-adjacent residues of met7. Thus, in some embodiments, the one or modifications comprise a substitution mutation. Active site-adjacent residues are amino acid residues that are close to the active site of an enzyme. Examples of active site-adjacent residues found in met7 include, without limitation, P20, R22, T23, A40, D41, D42, E43, T63, A64, P66, G86, R87, G88, R89, G90. Specifically, the inventors have shown that the following point mutations increase the deaminase activity of met7: A64K, A64L, A64M, A64R, A64W, G88A, G88C, G88D, G88F, G88H, G88I, G88K, G88L, G88M, G88N, G88P, G88Q, G88R, G88S, G88V, G90W, G90K, D42K, D42R, D41K, D41R, T23K, and E43K. Thus, in some embodiments, the engineered deaminase comprises a substitution mutation at one or more of these residues or at a functionally equivalent residue.
As used herein, a “functionally equivalent residue” is an amino acid residue that serves the same functional role as the corresponding residue in a reference protein (i.e., CDAT8). Often, functionally equivalent residues occur at homologous positions in the amino acid sequences of the wild-type enzymes (e.g., positions that are aligned in an amino acid sequence alignment). Hence, the term “functionally equivalent residue” also encompasses substitutions that are “positionally equivalent” or “homologous” to a disclosed substitution, regardless of whether or not the particular function of the residue is known. It is possible to identify the locations of functionally equivalent and positionally equivalent residues in the amino acid sequences of two or more enzymes via a sequence alignment and/or molecular modelling.
In the Examples, the inventors demonstrate that met7 converts unmethylated cytosine (C) to uracil (U) and converts methylated cytosine (mC) to thymine (T). Thus, in some embodiments, the engineered deaminase converts cytosine (C) to uridine (U) in ssDNA and/or converts methylcytosine (mC) to thymine (T) in ssDNA. Both of these conversions show up in sequencing data as G:C to T:A transversions, which can be identified by comparing sequencing reads to a reference sequence. Notably, some of the met7 variants tested by the inventors exhibit preference for mC over C. met7 and the variants thereof described herein can be used as is to map mC if an additional blocking step is included in the method prior to treatment with the deaminase. In the Examples, the inventors demonstrate that met7 is active on unmethylated cytosine (C), methylated
cytosine (mC), and hydroxymethylcytosine (hmC), but is not active on formylcytosine (fC) or carboxylcytosine (caC) (FIG. 4). Thus, if all C in a DNA sequence are converted to caC prior to treatment with met7, met7 will only be able to deaminate mC and hmC, allowing it to be used for mapping of these modified bases.
TET2-based methods can also be used to distinguish between modified and unmodified C. In such methods, TET2 is used to oxidize modified C (mC and hmC), which protects them from cytidine deamination. Then, a cytidine deaminase is used to convert unprotected unmodified C to U, and the positions of modified C are then inferred based on the lack of a transversion mutation. For a detailed description of such methods, see the NEBNext Enzymatic Methyl-seq (EM-seq™) product page (www.neb.com/products/e7120-nebnext-enzymatic-methyl-seq- kit#Product%20Information) and Fullgrabe et al. (Simultaneous sequencing of genetic and epigenetic bases in DNA. Nat Biotechnol, 2023), which are hereby incorporated by reference in their entirety. Other methods for distinguishing between C modification states may also be utilized, including those described by Wang et al. (Direct enzymatic sequencing of 5-methylcytosine at single-base resolution. Nat Chem Biol 19, 1004-1012, 2023).
Additional methods may be employed to distinguish between mC and hmC. One such method is described by Fullgrabe et al. In this method, 5hmC is glycosylated using betaglycosyltransferase, and then methylation at 5mC is selectively enzymatically copied across the CpG unit to the C on the copy strand using DNA methyltransferase 5 (DNMT5). Then, the DNA is treated with TET2 to oxidize mC (5hmC is still protected by glycosylation), and the cytidine deaminase is used to convert unprotected unmodified C to U.
In the Examples, the inventors demonstrate that met7 has increased ssDNA deaminase activity relative to CDAT8. Thus, in some embodiments, the engineered deaminase has increased ssDNA deaminase activity relative to the wild-type enzyme. Suitably, the ssDNA deaminase activity of the engineered deaminase is at least 25% greater, at least 50% greater, at least 100% (2- fold) greater, at least ten times greater, at least 20-times greater, or at least 100-times greater than the ssDNA deaminase activity of the wild-type enzyme. ssDNA deaminase activity can be assessed using a variety of in vitro assays, including the USER cleavage assay, NGS assay, and Swal assay described in Example 1.
Importantly, the engineered deaminases of the present invention bind to and deaminate ssDNA at a temperature that destabilizes ssDNA secondary structure. As used herein, a
“temperature that destabilizes ssDNA secondary structure” is a temperature at which the secondary structures present in ssDNA unfold. The exact temperature needed to destabilize ssDNA secondary structure is largely dependent on the specific sequence of the ssDNA, but generally falls within the range of 50°C to 60°C. Thus, the engineered deaminases of the present invention deaminate ssDNA at temperatures greater than 50°C.
As used herein, the term “deamination profile” refers to the preference of a deaminase to deaminate one form of cytosine over another. In some embodiments, the engineered deaminase has an altered deamination profile as compared the wild-type enzyme. The present disclosure provides three types of engineered deaminases with three different deamination profiles. One type of engineered deaminase preferentially deaminates methylated cytosine (5mC) instead of cytosine (C) (i.e., converts 5mC to thymine (T) at a greater rate than it converts C to uridine (U)) and is referred to herein as having “cytosine-defective deaminase activity .” A second type of engineered deaminase preferentially deaminates C instead of 5mC (i.e., converts C to U at a greater rate than it converts 5mC to T) and is referred to herein as having “5mC-defective deaminase activity.” A third type of engineered deaminase preferentially deaminates C and 5mC to U and T, respectively, and has significantly reduced deamination of 5-hydroxymethylcytosine (5hmC), 5-formyl cytosine (5fC), and 5-carboxy cytosine (5CaC). The third type is referred to herein as having “5hmC- defective deaminase activity.” Unless the context indicates otherwise, reference to an engineered deaminase includes engineered deaminases having cytosine-defective deaminase activity, engineered deaminases having 5mC-defective deaminase activity, and engineered deaminases having 5hmC-defective deaminase activity.
An engineered deaminase that preferentially deaminates 5mC instead of C (i.e., has cytosine-defective deaminase activity) can have a catalytic efficiency that is at least 2-fold, at least 5-fold, at leastl 0-fold, at least 50-fold, or at least 100-fold higher on 5mC substrates than on C substrates. In one embodiment, an engineered deaminase that preferentially deaminates 5mC instead of C can have a catalytic efficiency that is no greater than 1500-fold higher on 5mC than C substrates.
An engineered deaminase that preferentially deaminates C instead of 5mC (i.e., has 5mC- defective deaminase activity) can have a catalytic efficiency that is at least 2-fold, at least 5-fold, at least 10-fold, at least 50-fold, or at least 100-fold higher on C than 5mC substrates. In one
embodiment, an engineered deaminase that preferentially deaminates C instead of 5mC can have a catalytic efficiency that is no greater than 1500-fold higher on C than 5mC substrates.
When compared to a the wild-type enzyme, an engineered deaminase that deaminates C and 5mC to U and T, respectively, and has significantly reduced deamination of 5hmC (i.e., has 5hmC-defective deaminase activity), deamination of 5hmC may be reduced by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 99% compared to the wild-type enzyme. In one embodiment, the deamination of 5hmC by an engineered deaminase disclosed herein is undetectable using an assay such as the Arc/I-based assay described herein.
The engineered deaminases described herein include proteins with at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% amino acid sequence similarity to either a wild-type enzyme disclosed herein (i.e., SEQ ID NO: 1 or 6-68) or truncation thereof (i.e., SEQ ID NO: 2).
A pair-wise comparison analysis of amino acid sequences can be conducted, for instance, by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by visual inspection (see generally Current Protocols in Molecular Biology, Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., supplemented through 2004). One example of an algorithm that is suitable for determining structural similarity is the BLAST® algorithm, which is described in Altschul et al., J. Mol. Biol. 215:403-410 (1990). The BLAST® algorithm can be used to calculate percent sequence identity and percent sequence similarity between two sequences. Software for performing BLAST® analyses is publicly available through the National Center for Biotechnology Information. In the comparison of two amino acid sequences, similarity may be referred to by percent "identity" or may be referred to by percent "similarity." "Identity" refers to the presence of identical amino acids. "Similarity" refers to the presence of not only identical amino acids but also the presence of conservative substitutions. Thus, in one embodiment the amino acid sequence
of a cytidine deaminase protein having sequence similarity to a reference sequence may include conservative substitutions of amino acids present in that reference sequence.
A conservative substitution for an amino acid in a protein may be selected from other members of the class to which the amino acid belongs. For example, it is well-known in the art of protein biochemistry that an amino acid belonging to a grouping of amino acids having a particular size or characteristic (such as charge, hydrophobicity, or hydrophilicity) can be substituted for another amino acid without altering the activity of a protein, particularly in regions of the protein that are not directly associated with biological activity. For example, amino acids having a nonpolar side chain include alanine, glycine, isoleucine, leucine, methionine, phenylalanine, proline, tryptophan, and valine; amino acids having a hydrophobic side chain include glycine, alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, and tryptophan; amino acids having a polar side chain include arginine, asparagine, aspartic acid, glutamine, glutamic acid, histidine, lysine, serine, cysteine, tyrosine, and threonine; and amino acids having an uncharged side chain include glycine, serine, cysteine, asparagine, glutamine, tyrosine, and threonine.
Polynucleotides Encoding Engineered Deaminases
The engineered deaminases described herein also may be identified in terms of polynucleotide sequences that encode them. Thus, this disclosure provides polynucleotides that encode an engineered deaminase described herein or hybridize, under standard hybridization conditions, to a polynucleotide that encodes an engineered deaminase described herein, and the complements of such polynucleotide sequences.
The terms “polynucleotide,” “nucleic acid,” and “oligonucleotide” are used interchangeably to refer a polymer of DNA or RNA. A polynucleotide may be single-stranded or double-stranded and may represent the sense or the antisense strand. A polynucleotide may be synthesized or obtained from a natural source. A polynucleotide may contain natural, non-natural, or altered nucleotides, as well as natural, non-natural, or altered internucleotide linkages (e.g., phosphoroamidate linkages, phosphorothioate linkages).
The polynucleotides of the present invention can include any polynucleotide sequence that encodes an engineered deaminase disclosed herein. Thus, the sequence of the polynucleotide may be deduced from the amino acid sequence that is to be encoded by the polynucleotide. An engineered deaminase can be encoded by multiple codons, and certain translation systems often exhibit codon bias (i.e., different organisms often prefer one of the several synonymous codons
that encode the same amino acid). As such, polynucleotides presented herein are optionally “codon optimized,” meaning that the polynucleotides are synthesized to include codons that are preferred by the particular translation system being employed to express the protein. For example, when it is desirable to express the protein in a bacterial cell (or even a particular strain of bacteria), the polynucleotide can be synthesized to include codons most frequently found in the genome of that bacterial cell, for efficient expression of the engineered deaminase. A similar strategy can be employed when it is desirable to express the engineered deaminase in a eukaryotic cell, e.g., the nucleic acid can include codons preferred by that eukaryotic cell.
A polynucleotide described herein may also, advantageously, be included in a suitable expression vector to express the engineered deaminase encoded therefrom in a suitable host. Incorporation of cloned DNA into a suitable expression vector for subsequent transformation of a host cell and subsequent selection of the transformed cells is well known to those skilled in the art as provided in Sambrook et al. (1989), Molecular cloning: A Laboratory Manual, Cold Spring Harbor Laboratory.
Such an expression vector includes a vector having a polynucleotide described herein operably linked to heterologous regulatory sequences, such as promoters, that are capable of effecting expression of said DNA fragments. The term "operably linked" refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. Such vectors may be transformed into a suitable host cell to provide for the expression of an engineered deaminase.
The polynucleotide may encode either a mature protein or a protein having a pro-sequence, including that encoding a leader sequence on the preprotein which is then cleaved by the host cell to form a mature protein.
The vectors may be, for example, plasmid, virus, or phage vectors provided with an origin of replication, and optionally a promoter for the expression of said nucleotide and optionally a regulator of the promoter. The vectors may contain one or more selectable markers, such as, for example, an antibiotic resistance gene.
Regulatory elements required for expression include promoter sequences to bind RNA polymerase and to direct an appropriate level of transcription initiation and also translation initiation sequences for ribosome binding. For example, a bacterial expression vector may include a promoter such as the lac promoter and for translation initiation the Shine-Dalgamo sequence and
the start codon AUG. Similarly, a eukaryotic expression vector may include a heterologous or homologous promoter for RNA polymerase II, a downstream polyadenylation signal, the start codon AUG, and a termination codon for detachment of the ribosome. Such vectors may be obtained commercially or be assembled from the sequences described by methods well known in the art.
Transcription of DNA encoding an engineered deaminase may be optimized by including an enhancer sequence in the vector. Enhancers are cis-acting elements of DNA that act on a promoter to increase the level of transcription. Vectors will also generally include origins of replication in addition to the selectable markers.
Cells that express the engineered deaminases described herein are also provided. A “cell” is the basic unit from which all living things are composed. The cells of the present invention may be of any type, including any eukaryotic or prokaryotic cell type. Examples of suitable host cells for protein expression include, but are not limited to, E. coli and S. cerevisiae.
Making and isolating engineered deaminases
Generally, polynucleotides encoding an engineered deaminase as presented herein can be made by cloning, recombination, in vitro synthesis, in vitro amplification and/or other available methods. A variety of recombinant methods can be used for expressing an expression vector that encodes an engineered deaminase presented herein. Methods for making recombinant polynucleotides, expression, and isolation of expressed products are well known and described in the art.
Polynucleotides encoding wild type cytidine deaminases can be obtained from a source and subjected to mutagenesis to introduce one or more substitution mutations described herein. In general, any available mutagenesis procedure can be used for making an engineered deaminase described herein. Procedures that can be used include, but are not limited to: site-directed point mutagenesis, in vitro or in vivo homologous recombination, oligonucleotide-directed mutagenesis, mutagenesis by total gene synthesis, and many others known to persons skilled in the art.
Additional useful references for mutation, recombinant, and in vitro nucleic acid manipulation methods (including cloning, expression, PCR, and the like) include Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology volume 152 Academic Press, Inc., San Diego, Calif. (Berger); Kaufman et al. (2003) Handbook of Molecular and Cellular Methods in Biology and Medicine Second Edition Ceske (ed) CRC Press (Kaufman); The Nucleic
Acid Protocols Handbook Ralph Rapley (ed) (2000) Cold Spring Harbor, Humana Press Inc (Rapley); Chen et al. (ed) PCR Cloning Protocols, Second Edition (Methods in Molecular Biology, volume 192) Humana Press; and in Viljoen et al. (2005) Molecular Diagnostic PCR Handbook Springer, ISBN 1402034032.
In addition, many kits are commercially available for the purification of plasmids or other relevant nucleic acids from cells. An isolated polynucleotide can be further manipulated to produce other polynucleotides, used to transfect or transform cells, incorporated into related vectors and introduced into cells for expression, and/or the like. Typical cloning vectors contain transcription and translation terminators, transcription and translation initiation sequences, and promoters useful for regulation of the expression of the particular target nucleic acid. The vectors optionally include generic expression cassettes containing at least one independent terminator sequence, sequences permitting replication of the cassette in eukaryotes, or prokaryotes, or both, (e.g., shuttle vectors) and selection markers for both prokaryotic and eukaryotic systems. Vectors are suitable for replication and integration in prokaryotes, eukaryotes, or both.
Other useful references, e.g., for cell isolation and culture (e.g., for subsequent nucleic acid isolation) include Freshney (1994) Culture of Animal Cells, a Manual of Basic Technique, third edition, Wiley -Liss, New York and the references cited therein; Payne et al. (1992) Plant Cell and Tissue Culture in Liquid Systems John Wiley & Sons, Inc. New York, N.Y.; Gamborg and Phillips (eds) (1995) Plant Cell, Tissue and Organ Culture; Fundamental Methods Springer Lab Manual, Springer-Verlag (Berlin Heidelberg New York); and Atlas and Parks (eds) The Handbook of Microbiological Media (1993) CRC Press, Boca Raton, Fla. Construction of vectors containing a nucleic acid encoding an engineered deaminase described herein employs standard ligation techniques known in the art. See, e.g., Sambrook et al, Molecular Cloning: A Laboratory Manual., Cold Spring Harbor Laboratory Press (1989) or Ausubel, R.M., ed. Current Protocols in Molecular Biology ( 1994).
A variety of protein isolation and detection methods are known and can be used to isolate an engineered deaminase, e.g., from recombinant cultures of cells expressing the recombinant cytidine deaminase presented herein. A variety of protein isolation and detection methods are well known in the art, including, e.g., those set forth in R. Scopes, Protein Purification, Springer-Verlag, N.Y. (1982); Deutscher, Methods in Enzymology Vol. 182: Guide to Protein Purification, Academic Press, Inc. N.Y. (1990); Sandana (1997) Bioseparation of Proteins, Academic Press,
Inc.; Bollag et al. (1996) Protein Methods, 2nd Edition Wiley-Liss, NY; Walker (1996) The Protein Protocols Handbook Humana Press, NJ, Harris and Angal (1990) Protein Purification Applications: A Practical Approach IRL Press at Oxford, Oxford, England; Harris and Angal Protein Purification Methods: A Practical Approach IRL Press at Oxford, Oxford, England; Scopes (1993) Protein Purification: Principles and Practice 3rd Edition Springer Verlag, NY; Janson and Ry den (1998) Protein Purification: Principles, High Resolution Methods and Applications, Second Edition Wiley-VCH, NY; and Walker (1998) Protein Protocols on CD-ROM Humana Press, NJ; and the references cited therein. Additional details regarding protein purification and detection methods can be found in Satinder Ahuja ed., Handbook of Bioseparations, Academic Press (2000).
An engineered deaminase protein or polynucleotide can be isolated. An "isolated" protein or polynucleotide is one that has been removed from a cell. For instance, an isolated protein is a polypeptide that has been removed from the cytoplasm or from the membrane of a cell, and many of the proteins, nucleic acids, and other cellular material of its natural environment are no longer present. Proteins that are produced outside of a cell, e.g., through chemical or recombinant means, are considered to be isolated by definition, as they were never present in a cell.
Methods of use
The engineered deaminases provided by the present disclosure can be integrated into essentially any application for identifying modified cytosines. For instance, engineered deaminases can be integrated into applications that include sequencing library preparation. Examples of sequencing library preparation include, but are not limited to, whole genome, accessible (e.g., ATAC), conformational state (e.g., HiC), and reduced representation bisulfite sequencing (RRBS). It can be particularly useful in essentially any application using low input DNA or RNA such as, but not limited to, single cell combinatorial indexing (sci) methods like sci- WGS-seq, sci-MET-seq, and sci-ATAC-seq, sci-RNA-seq, and cell free DNA-based methods. Specific applications include, but are not limited to, identifying one or more patterns of cytosine modification such as determining methylation on CpG islands and reduced representation bisulfite sequencing (RRBS); variant calling, including SNV/indel, copy number variation (CNV), short tandem repeats (STR), and structural variants (SV); detecting differentially methylated regions (DMRs); measuring methylation at promoters; and detecting tumor DNA.
The engineered deaminases provided by the present disclosure can be integrated into any application that includes locus-specific methylation profiling. Typical locus-specific detection of
epigenetic methylated cytosines, such as 5mC, require the use of 5mC-specific antibodies, or multi-step chemical or chemoenzymatic transformations that lead to deamination of C or 5mC to U/T to enable differentiation of the two C-isoforms. When combined with various in vitro detection methodologies, these approaches can be strong approaches to detect 5mC at defined loci. However, these methods can be confounded by antibody cross-reactivity and stability, or the toxicity and complex workflows required by chemical and chemoenzymatic approaches. Use of an engineered deaminase described herein in a single enzymatic deamination protocol permits selective conversion of 5mC to T that is compatible with a number of in vitro diagnostic modalities, resulting in locus-specific detection of 5mC.
Instead of using destructive methods for identifying methylated cytosines, integrating the cytidine deaminases provided by the present disclosure into methods for identifying modified cytosines, such as sequencing library production and locus-specific methylation profiling, results in the more efficient enzyme-catalyzed conversion of modified cytosines during generation of target nucleic acids, thereby permitting better sequencing data and better retention of genetic information, which is demonstrated by high variant calling performance. Furthermore, as an enzymatic method for conversion, the use of the engineered deaminases enables high coverage uniformity and low sample damage, which results in lower nucleic acid input requirements. A multitude of sequencing library methods are known to a skilled person that can be used in the construction of whole-genome or targeted libraries (see, for instance, Sequencing Methods Review, available on the world wide web at illumina.com/content/dam/illumina- marketing/documents/products/research_reviews/sequencing-methods-review.pdf; DNA
Sequencing Methods Collection, available on the world wide web at illumina.com/content/dam/illumina-marketing/documents/products/research_reviews/dna- sequencing-methods-review-web.pdf; and RNA Sequencing Methods Collection, available on the world wide web at illumina.com/content/dam/illumina- marketing/documents/products/research_reviews/rna-sequencing-methods-review-web.pdf).
In general, methods for using an engineered deaminase of the present disclosure include contacting target nucleic acids, e.g., DNA or RNA, with the enzyme, under conditions suitable for deamination of modified cytidines. Because amplification of DNA does not preserve the modification status of cytidine (e.g., the methylation status of 5mC and 5hmC is not retained), use of an engineered deaminase typically occurs before amplification of target DNA. Target nucleic
acids can be contacted with an engineered deaminase at essentially any time in a method before an amplification, provided the DNA is single-stranded. For instance, target nucleic acids can be contacted with an engineered deaminase while the nucleic acids are inside a fixed or unfixed cell or nucleus, after isolation of genomic or cell free DNA or mRNA, before or after fragmentation, or before or after tagmentation. The skilled person will recognize that target nucleic acids can be contacted with engineered deaminase after addition of a universal sequence and/or an adapter, provided the universal sequence and/or an adapter is not added by amplification.
A method for using an engineered deaminase can include the optional step of comparison of the treated target nucleic acid with an untreated nucleic acid or comparison of the treated target nucleic acid with a nucleic acid treated with a wild type cytidine deaminase. For instance, in embodiments where the treated nucleic acid is sequenced, the sequence can be compared to a reference sequence thereby permitting easy identification of point mutations and inference of modified cytosines. Thus, in embodiments where an engineered deaminase having cytosinedefective deaminase activity (i.e., converts 5mC to T at a greater rate than converting C to U) is used, C to T point mutations can be easily identified, and these point mutations are inferred as 5mC positions. In embodiments where an engineered deaminase having 5hmC-defective deaminase activity (i.e., preferentially deaminates C and 5mC to U and T, respectively, and has significantly reduced deamination of 5hmC) is used, the absence of C to T point mutations can be easily identified, and the absence point mutations are inferred as 5hmC positions. In embodiments where the treated nucleic acid is not sequenced, the nucleic acid can be treated with an engineered deaminase and compared to the nucleic acid that is untreated, i.e., not contacted with an engineered deaminase. Here the read-out typically depends on the assay method, for instance when an amplification is used the relative amounts of amplification can be identified and the presence or absence of a 5mC or a pattern of cytosine modification at a predetermined sequence inferred.
Reaction conditions suitable for deamination of modified cytidines, such as conversion of 5mC to thymidine, by a cytidine deaminase described herein include, but are not limited to, a substrate of target nucleic acid that is single-stranded (ss) DNA or RNA suspected of including at least one modified cytidine, buffer, pH, temperature of the reaction, time of the reaction, and concentration of the engineered deaminase and/or ss DNA or RNA substrate. In one embodiment, double-stranded (ds) DNA can be denatured and exposed to an engineered deaminase. Methods
for denaturing dsDNA are known and routine, and include heat treatment, chemical treatment, such as NaOH, formamide, DMSO, or N, N-dimethylformamide (DMF), or a combination thereof.
Target nucleic acids useful in the methods of the present disclosure are described herein. A modified cytidine present on a substrate single-stranded (ss) DNA or RNA includes, but is not limited to, 5-methyl cytosine (5mC), 5 -hydroxymethyl cytosine (5hmC), 5-formyl cytosine (5fC), and 5-carboxy cytosine (5CaC) (FIG. 2). In one embodiment, the modified cytidine is 5-methyl cytosine. In one embodiment, the modified cytidine is 5 -hydroxymethyl cytosine. Methods that use double stranded target DNA for generating a sequencing library can be modified to include denaturation to convert the double stranded target DNA to ssDNA. In some embodiments, dsDNA that is used in a tagmentation reaction or for adapter attachment can be denatured and then treated with an engineered deaminase. Conditions for denaturation are known and routine. In those embodiments where ssDNA is contacted with an engineered deaminase and subsequently used in a process that requires dsDNA, e.g., addition of universal adapters by tagmentation or ligation, the ssDNA can be converted to dsDNA using routine methods.
In some embodiments, an engineered deaminase as presented herein can be used to differentiate between 5-methyl cytosine (5mC) and 5 -hydroxymethyl cytosine (5hmC). In such an embodiment, a sample of DNA suspected of including single-stranded DNA comprising at least one 5-methyl cytosine (5mC) or 5 -hydroxymethyl cytosine (5hmC) is modified to prevent an engineered deaminase from converting 5hmC to thymidine. Methods for blocking deaminase activity are known in the art, and any one of a number of methods can be used to protect 5hmC from deaminase activity. As one example, target DNA can be treated to modify 5hmC but not 5mC such that 5hmC is an unsuitable substrate for cytidine deaminase activity. In a specific example, a glucosyltransferase enzyme can be used to glucosylate 5hmC but not 5mC. Glucosyltransferase enzymes are known to those of skill in the art, and include, for example, p-glucosyltransferase (PGT). By way of example, the enzyme T4 p-glucosyltransferase is commercially available (PGT, NEB) and can be used for modification of 5hmC. Methods for using a PGT to glucosylate 5hmC are known in the art, and can be used in conjunction with the use of engineered deaminase enzymes as presented here. For example, a sample of DNA can be treated with a PGT to glucosylate 5hmC in the sample DNA prior to treating the DNA with the engineered deaminase enzyme. By treating the sample DNA with a PGT, 5hmC is protected from the deaminase activity of the engineered deaminase enzyme. Thus, 5mC will be detected in downstream readout, such as sequencing, PCR,
array, and the like, as a thymidine. In contrast, any protected 5hmC sites will be detected as cytosine in the same readout. Enzymes, buffers, and conditions for performing glucosylation of 5hmC are known in the art, as exemplified by the methods disclosed in Schutsky et al., Nature biotechnology, 10.1038/nbt.4204. 8 Oct. 2018, doi: 10.1038/nbt.4204.
In some embodiments, an engineered deaminase as presented herein can be used in in vitro diagnostic (IVD) approaches for profiling methylation in a locus-specific manner. Current methods for methylation biomarker detection typically include digestion of genomic DNA with methylation-sensitive enzymes and then quantitative PCR (qPCR) at a locus of interest to quantify the extent of restriction enzyme digestion, and therefore the percent methylation at that site. This is followed by mismatch-sensitive qPCR of bisulfite-treated DNA, where 5mC is read out as a lack of 5mC>T conversion. These methods, however, have drawbacks. The recognition site of the methyl-sensitive restriction enzyme must be present in the methylated region of the target locus. Bisulfite treatment requires a large quantity of starting DNA and results in conversion to a low complexity genome (unmethylated cytosines - which represent the majority of cytosines in the genome - are converted to U and read as T). This reduced complexity of the genomic template constrains the design of qPCR primers that hybridize specifically to the locus of interest. During bisulfite conversion, DNA is intrinsically damaged or lost, which can hinder downstream analysis. DNA damage decreases coverage uniformity of the genome, which can lead to bias coverage. Furthermore, incomplete bisulfite conversion has the potential to adversely affect results, since it can exaggerate DNA methylation levels (Sam et al., PLoS One. 2018; 13(6); Ehrich et al., Nucleic Acids Res. Oxford University Press; 2007;35: e29).
5mC to T conversion by the engineered deaminases described herein obviates the need for restriction enzymes or bisulfite treatment, and preserves DNA complexity. The resulting modifications of one or more cytosines can be detected using established in vitro diagnostic (IVD) approaches for profiling methylation in a locus-specific manner. Examples of approaches include detection of 5mC loci via amplification, e.g., quantitative PCT (qPCR), detection of 5mC loci using a CRISPR-based system, e.g., CRISPR-Casl2, spatial detection of 5mC using molecular cytogenic methods, e.g., fluorescence in situ hybridization (FISH), and array-based detection of 5mC. In one embodiment, in vitro diagnostic (IVD) approaches for profiling methylation in a locus-specific manner use one or more primers to anneal to a predetermined sequence that may include one or more modified cytosines. After treatment of target nucleic acids with an engineered
deaminase, the modified cytosines present in the target nucleic acids are converted as described herein (e.g., 5mC is converted to T), and primers can be easily designed to anneal with higher affinity to a predetermined sequence when it includes nucleotides resulting from the deaminase treatment (e.g., a T nucleotide where a 5mC was present prior to treatment). For example, primers used for an amplification bind with greater affinity to a nucleic acid that includes T nucleotides where 5mC nucleotides were present prior to treatment. The annealing of a primer to a predetermined sequence that includes the expected 5mC to T conversion(s) allows one to infer the location of a modified cytosine in the untreated target nucleic acid. A primer that binds with greater affinity to a nucleic acid that includes T nucleotides where 5mC nucleotides were present prior to treatment can include at least 1, at least 2, at least 3, at least 4 or at least 5 nucleotides that will base-pair with a nucleotide that results from conversion of 5mC to T, i.e., an adenine (A), and when amplification is used, then a second primer for the reverse strand that has a T instead of guanine (G).
In some embodiments, target nucleic acids obtained from a subject can be treated with an engineered deaminase to result in converted nucleic acids, and a pattern of cytosine modification can be identified in the converted nucleic acids. The pattern of cytosine modification can optionally be compared with the pattern of cytosine modification in a reference nucleic acid. In embodiments where a pattern of cytosine modification correlates with a disease or condition, the method can be used in diagnostic or prognostic applications. For instance, the subject can have or be at risk of having a disease or condition, and the reference nucleic acid can be from a normal subject, e.g., a subject that does not have and is not at risk for the disease or condition. The pattern of cytosine modification can be associated with a disease or condition (e.g., the target nucleic acid can be a predetermined sequence), and identification in the subject of a pattern of cytosine modification associated with a disease or condition can indicate the subject has or is at risk of having the disease or condition. For instance, a pattern of cytosine modification can be linked in-cv.s to a coding region that is correlated with a disease or condition and identification of that pattern, or absence of that pattern, in the subject can be used for diagnosis or prognosis. In one embodiment, the coding region can be one that is transcriptionally active or transcriptionally inactive in a reference nucleic acid. The comparison of the converted nucleic acid to the reference nucleic acid can include determining if the pattern of cytosine modification of the converted nucleic acid indicates the coding region is transcriptionally active or transcriptionally inactive in the subject. When that coding region is
associated with a disease or condition, the status of transcriptional activity can be used for diagnosis or prognosis.
Comparison of a pattern of cytosine modification in a subject can also be used in identifying changes in a pattern of cytosine modification in a subject over time. For instance, a subject can have a disease or conditions and is undergoing treatment, or a subject had a disease or condition and is cured (e.g., the subject was treated and no signs of the disease or condition are present) or in remission (e.g., the subject was treated and signs of the disease or condition are reduced). Target nucleic acids from the subject at different times, e.g., before treatment started, during treatment, after treatment is stopped, can be compared and a pattern of cytosine modification of a sequence, e.g., a predetermined sequence compared and used to determine the progress of a treatment or the status of the disease or condition in the subject.
In some embodiments where detection of 5mC nucleotides uses amplification, the use of a polymerase that disfavors uracil can aid in reducing the amplification of treated target nucleic acids that include spurious C to U conversion that may result from use of an engineered deaminase. B- family polymerases are known to exhibit “uracil read-ahead” function which causes stalling of the polymerase at uracil residues (Greagg et al., 1999, PNAS USA,' 96(16):9045-50). Examples of B- family polymerases that disfavor uracil include archaeal B-family polymerases from Pyrococcus furiosus (Pfu), Thermococcus kodakarensis (KOD), Thermococcus litoralis (Tli/Vent), Pyrococcus woesei (Pwo), and Thermococcus fumicolans (Tfu). Other examples of uracil- disfavoring polymerases include Phusion™, Q5®, and Kapa HiFi™. In other embodiments where amplification of nucleic acids containing uracil nucleotides is desired, the use of a uracil tolerant polymerase can be used. Examples of uracil-tolerant polymerases include PhusionUTM, Q5U®, KapaUTM, Taq, and Dpo4.
It is expected that an engineered deaminase can function in essentially any buffer. Examples of useful buffers include, but are not limited to: a citrate buffer, such as the citrate buffer available from Thermo Fisher Scientific (Cat. No. #005000); sodium acetate buffer, Bis TrisPropane HC1; and Tris-HCl Tris. Examples of other buffers include, but are not limited to, Bicine, DIPSO, glycylglycine, HEPES, imidazole, malonate, MES, MOPS, PB, phosphate, PIPES, SPG, succinate, TAPS, TAPSO, trincine. In some embodiments a reducing agent such as dithiothreitol (DTT) can be present. In some embodiments a divalent cation is not included.
A deamination reaction can occur at “a temperature that destabilizes ssDNA secondary structure”. The minimal temperature that destabilizes ssDNA secondary structure is largely dependent on the specific sequence of the ssDNA, but generally falls within the range of 50°C to 60°C. Suitably, the deamination reaction can occur at a temperature between about 50°C and 90°C, 60°C and 80°C, or 70°C and 80°C.
Some engineered deaminases described herein preferentially deaminate a modified cytosine to thymidine at a faster rate than deamination of cytosine to uracil. Thus, in some embodiments the time of reaction can be used to maximize the difference of deamination of modified cytosine versus deamination of cytosine. In one embodiment, the reaction can proceed for at least 15 minutes, at least 30 minutes, at least 45 minutes, at least 60 minutes, at least 90 minutes, at least 120 minutes, or at least 150 minutes, and for no greater than 15 minutes, no greater than 30 minutes, no greater than 45 minutes, no greater than 60 minutes, no greater than 90 minutes, no greater than 120 minutes, no greater than 150 minutes, or no greater than 180 minutes.
In one embodiment, a deamination reaction can include an engineered deaminase at a concentration from at least 0.5 micromolar (pM) to no greater than 5 pM. For instance, the concentration of the enzyme can be at least 0.5, at least 1 pM, at least 2 pM, at least 3 pM, at least 4 pM, or 5 pM, and/or no greater than 5 pM, no greater than 4 pM, no greater than 3 pM, no greater than 2 pM, no greater than 1 pM, or 0.5 pM. In one embodiment, a deamination reaction can include nucleic acids at a concentration of at least 400 nanomolar (nM) to no greater than 2 pM. For instance, the concentration of nucleic acids can be at least 400 nM, at least 500 nM, at least, 600 nM, at least 700 nM, at least 800 nM, at least 900 nM, or 1 pM, and/or no greater than 1 pM, no greater than 900 nM, no greater than 800 nM, no greater than 700 nM, no greater than 600 nM, no greater than 500 nM, or 400 nM.
In one embodiment, a deamination reaction can include an RNAse. RNase A has been implicated in increasing activity of cytidine deaminases (Bransteitter et al., Proceedings of the National Academy of Sciences of the United States of America 100, no. 7 (2003): 4102-7. doi.org/10.1073/pnas.0730835100). When activity of an engineered deaminase of the present disclosure was determined in the presence of RNAse A the opposite was observed. When RNAse A was included in the reaction, an engineered deaminase having cytosine-defective deaminase activity (i.e., converts 5mC to T at a greater rate than converting C to U) had reduced activity, and the reduced activity was more pronounced for off-target cytosine deamination. Thus, RNAse A
resulted in greater selectivity for deamination of 5mC compared to C. An RNAse A can be included in a deamination reaction at a concentration from at least 1 microgram/milliliter (ug/ml) to no greater than 20 pM. For instance, the concentration of RNAse A can be at least 1 ug/ml , at least 2 ug/ml, at least 3 ug/ml, at least 4 ug/ml, 5 ug/ml, 6 ug/ml, 7 ug/ml, 8 ug/ml, or 9 ug/ml, and/or no greater than 50 ug/ml, no greater than 40 ug/ml, no greater than 30 ug/ml, no greater than 20 ug/ml, no greater than 19 ug/ml, no greater than 18 ug/ml, no greater than 17 ug/ml, no greater than 16 ug/ml, no greater than 15 ug/ml, no greater than 14 ug/ml, no greater than 13 ug/ml, no greater than 12 ug/ml, or no greater than 11 ug/ml. In one embodiment, the concentration of RNAse A is from 2 ug/ml to 13 ug/ml, or from 5 ug/ml to 10 ug/ml.
Target nucleic acids
The target nucleic acids contacted with an engineered deaminase and used in the methods, compositions, and kits provided herein may be essentially any nucleic acid of known or unknown sequence. Sequencing may result in determination of the sequence of the whole or a part of the target molecule. In one embodiment, target nucleic acids can be processed into templates suitable for amplification by the placement of universal amplification sequences, e.g., sequences present in a universal adaptor, at the ends of each target fragment.
Target nucleic acids are typically derived from primary nucleic acids present in a sample, such as a biological sample. The primary nucleic acids may originate as DNA or RNA. DNA primary nucleic acids may originate in double-stranded DNA (dsDNA) form (e.g., genomic DNA, genomic DNA fragments, cell-free DNA, and the like) from a sample or may originate in singlestranded form from a sample. RNA primary nucleic acids may be mRNA or non-coding RNA, e g., microRNA or small interfering RNA. The precise sequence of the polynucleotide molecules from a primary nucleic acid sample is generally not material to the disclosure and may be known or unknown.
The primary nucleic acid molecules may represent the entire genetic complement of an organism, e.g., genomic DNA molecules which include both intron and exon sequences, as well as non-coding regulatory sequences such as promoter and enhancer sequences. The primary nucleic acid molecules may represent the entire genetic complement of specific cells of an organism, e.g., from tumor cells, where the genomic DNA molecules which include both intron and exon sequences, as well as non-coding regulatory sequences such as promoter and enhancer sequences. In one embodiment, particular subsets of genomic DNA can be used, such as, for
example, particular chromosomes, DNA associated with open chromatin, DNA associated with closed chromatin, or one or more specific sequences such as a region of a specific gene (e.g., targeted sequencing). In one embodiment, the primary nucleic acid molecules may represent a particular subset of DNA, e.g., DNA having a specific sequence that anneals with a primer such as one used for targeted sequencing or target enrichment. In one embodiment, a particular subset of DNA can be used, such as cell-free DNA, which can include DNA of the subject including DNA from normal cells, DNA from diseased cells such as tumor cells, and/or DNA from fetal cells.
The primary nucleic acid molecules may represent the entire transcriptome of cells of an organism, e.g., mRNA molecules. The primary nucleic acid molecules may represent the entire transcriptome of specific cells of an organism, e.g., from tumor cells or for instance the cells of a tissue. In one embodiment, the primary nucleic acid molecules may represent a particular subset of mRNA, e.g., mRNA having a specific sequence that anneals with a primer such as one used for targeted sequencing or target enrichment.
A sample, such as a biological sample, can include nucleic acid molecules obtained from biopsies, tumors, scrapings, swabs, blood, mucus, urine, stool, plasma, semen, hair, laser capture micro-dissections, surgical resections, and other clinical or laboratory obtained samples. In some embodiments, the sample can be an epidemiological, agricultural, forensic or pathogenic sample. In some embodiments, the sample can include cultured cells. In some embodiments, the sample can include nucleic acid molecules obtained from an animal such as a human or mammalian source. In another embodiment, the sample can include nucleic acid molecules obtained from a non-mammalian source such as a plant, bacteria, virus, or fungus. In some embodiments, the source of the nucleic acid molecules may be an archived or extinct sample or species.
Additional non-limiting examples of sources of biological samples can include whole organisms as well as a sample obtained from a patient. The biological sample can be obtained from any biological fluid or tissue and can be in a variety of forms, including fluid, e.g., liquid or gas, tissue, solid tissue, and preserved forms of such a fluid or tissue, such as dried, frozen, and fixed forms. The sample may be of any biological tissue, cells or fluid. Such samples include, but are not limited to, sputum, blood, serum, plasma, blood cells (e.g., white cells), ascitic fluid, urine, saliva, tears, sputum, vaginal fluid (discharge), washings obtained during a medical procedure (e.g., pelvic or other washings obtained during biopsy, endoscopy or surgery), tissue, nipple
aspirate, core or fine needle biopsy samples, cell -containing body fluids, peritoneal fluid, and pleural fluid, or cells therefrom, and free floating nucleic acids such as cell-free circulating DNA. Biological samples may also include sections of tissues such as frozen or fixed sections taken for histological purposes or micro-dissected cells or extracellular parts thereof. In some embodiments, the sample can be a blood sample, such as, for example, a whole blood sample. In another example, the sample is an unprocessed dried blood spot (DBS) sample. In yet another example, the sample is a formalin-fixed paraffin-embedded (FFPE) sample. In yet another example, the sample is a saliva sample. In yet another example, the sample is a dried saliva spot (DSS) sample.
Exemplary biological samples from which target nucleic acids can be derived include, for example, those from a eukaryote, for instance a mammal, such as a rodent, mouse, rat, rabbit, guinea pig, ungulate, horse, sheep, pig, goat, cow, cat, dog, primate, human or non-human primate; a plant, such as Arabidopsis thaliana, com, sorghum, oat, wheat, rice, canola, or soybean; an algae, such as Chlamydomonas reinhardtii,' a nematode such as Caenorhabditis elegans an insect, such as Drosophila melanogaster, mosquito, fruit fly, honey bee or spider; a fish, such as zebrafish; a reptile; an amphibian, such as a frog or Xenopus laevis a Dictyostelium discoideum a fungi, such as Pneumocystis carinii, Takifugu rubripes, yeast, Saccharamoyces cerevisiae, or Schizosaccharomyces pombc, or a protozoan such as Plasmodium falciparum. Target nucleic acids can also be derived from a prokaryote such as a bacterium, Escherichia coif Staphylococcus or Mycoplasma pneumoniae an archaeon; a virus such as Hepatitis C virus or human immunodeficiency vims; or a viroid. Target nucleic acids can be derived from a homogeneous culture or population of organisms described herein or alternatively from a collection of several different organisms, for example, in a community or ecosystem.
In some embodiments, a biological sample includes tissue that is processed to obtain the desired primary nucleic acids. In some embodiments, cells are used obtain the desired primary nucleic acids. In some embodiments, nuclei are used to obtain the desired primary nucleic acids. The method can further include dissociating cells, and/or isolating nuclei from cells. Methods for isolating cells and nuclei from tissue are available (WO 2019/236599).
In some embodiments, nucleic acids present in tissue, in cells, or in isolated nuclei can be processed depending on the desired read-out. For instance, nucleic acids can be fixed during processing, and useful fixation methods are available (WO 2019/236599). Fixation can be useful to preserve a sample or maintain contiguity of analytes from a sample, a cell, or a nucleus. Fixation
methods preserve and stabilize tissue, cell, and nucleus morphology and architecture, inactivates proteolytic enzymes, strengthens samples, cells, and nuclei so they can withstand further processing and staining, and protects against contamination. Examples of methods where fixation can be useful include, but are not limited to, whole genome sequencing of isolated nuclei and chromosome conformation capture methods such as Hi~C. Common methods of fixation include perfusion, immersion, freezing, and drying (Srinivasan et al., Am J Pathol. 2002 Dec; 161(6): 1961-1971. doi: 10.1016/S0002-9440(10)64472-0). In some embodiments such as whole genome sequencing, isolated nuclei can be processed to dissociate nucleosomes from DNA while leaving the nuclei intact, and methods for generating nucleosome-free nuclei are available (WO 2018/018008).
In some embodiments, primary nucleic acids in bulk, e.g., from a plurality of cells, can be used to produce a sequencing library as described herein. In other embodiments, individual cells or nuclei can be used as sources of primary nucleic acids to obtain sequence information from single cells and nuclei. Many different single cell library preparation methods are known in the art, including, but not limited to, Drop-seq, Seq-well, and single cell combinatorial indexing ("sci- ") methods. Companies providing single cell products and related technologies include, but are not limited to, Illumina, 10X Genomics, Takara Biosciences, BD Biosciences, Bio-Rad Laboratories, Icellbio, Isoplexis, CellSee, NanoSelect, and Dolomite Bio. Sci-seq is a methodological framework that employs split-pool barcoding to uniquely label the nucleic acid contents of large numbers of single cells or nuclei. Typically, the number of nuclei or cells can be at least two. The upper limit is dependent on the practical limitations of equipment (e.g., multi-well plates, number of indexes) used in other steps of the methods as described herein. The number of nuclei or cells that can be used is not intended to be limiting and can number in the billions.
The target nucleic acids used in the methods and compositions of the present disclosure can be derived by fragmentation. Random fragmentation refers to the fragmentation of a polynucleotide molecule from a primary nucleic acid sample in a non-ordered fashion by enzymatic, chemical, or mechanical methods. Such fragmentation methods are known in the art and use standard methods (Sambrook and Russell, Molecular Cloning, A Laboratory Manual, third edition). Moreover, random fragmentation is designed to produce fragments irrespective of the sequence identity or position of nucleotides comprising and/or surrounding the break. In one embodiment, the random fragmentation is by mechanical means such as nebulization or sonication
to produce fragments of about 50 base pairs in length to about 1500 base pairs in length, still more particularly 50-700 base pairs in length, yet more particularly 50-400 base pairs in length. Most particularly, the method is used to generate smaller fragments of from 50-150 base pairs in length
Fragmentation of polynucleotide molecules by mechanical means (nebulization, sonication, and Hydroshear, for example) results in fragments with a heterogeneous mix of blunt and 3'- and 5 '-overhanging ends. It is therefore desirable to repair the fragment ends using methods or kits (such as the Lucigen DNA terminator End Repair Kit) known in the art to generate ends that are optimal for insertion, for example, into blunt sites of cloning vectors. In a particular embodiment, the fragment ends of the population of nucleic acids are blunt ended. More particularly, the fragment ends are blunt ended and phosphorylated. The phosphate moiety can be introduced via enzymatic treatment, for example, using polynucleotide kinase.
In a particular embodiment, the target fragment sequences are prepared with single overhanging nucleotides by, for example, activity of certain types of DNA polymerase such as Taq polymerase or Klenow exo minus polymerase which has a non-template-dependent terminal transferase activity that adds a single deoxynucleotide, for example, deoxyadenosine (A) to the 3' ends of a DNA molecule, for example, a PCR product. Such enzymes can be used to add a single nucleotide 'A' to the blunt ended 3' terminus of each strand of the double-stranded target fragments. Thus, an 'A' could be added to the 3' terminus of each end repaired strand of the double-stranded target fragments by reaction with Taq or Klenow exo minus polymerase, while the universal adapter polynucleotide construct could be a T-construct with a compatible 'T' overhang present on the 3' terminus of each region of double-stranded nucleic acid of the universal adapter. This end modification also prevents self-ligation of both vector and target such that there is a bias towards formation of target nucleic acids having a universal adapter at each end.
In one embodiment, fragmentation can be accomplished using a process often referred to as tagmentation. Tagmentation uses a transposome complex and combines into a single step fragmentation and ligation to add universal adapters (WO 2016/130704). A transposome complex is a transposase bound to a transposase recognition site and can insert the transposase recognition site into a target nucleic acid in a process sometimes termed "tagmentation. " In some such insertion events, one strand of the transposase recognition site may be transferred into the target nucleic acid. Such a strand is referred to as a "transferred strand." In one embodiment, a transposome complex includes a dimeric transposase having two subunits, and two non-contiguous transposon
sequences. In another embodiment, a transposase includes a dimeric transposase having two subunits, and a contiguous transposon sequence.
Some embodiments can include the use of a hyperactive Tn5 transposase and a Tn5-type transposase recognition site (Goryshin and Reznikoff, J. Biol. Chem., 273:7367 (1998)), or MuA transposase and a Mu transposase recognition site comprising R1 and R2 end sequences (Mizuuchi, K., Cell, 35: 785, 1983; Savilahti, H, et al., EMBO J., 14: 4893, 1995). Tn5 Mosaic End (ME) sequences can also be used by a skilled artisan.
Examples of transposon sequences useful with the methods and compositions described herein are provided in U.S. Patent Application Pub. No. 2012/0208705, U.S. Patent Application Pub. No. 2012/0208724 and Int. Patent Application Pub. No. WO 2012/061832. In some embodiments, a transposon sequence includes a first transposase recognition site and a second transposase recognition site.
Some transposome complexes useful herein include a transposase having two transposon sequences. In some such embodiments, the two transposon sequences are not linked to one another, in other words, the transposon sequences are non-contiguous with one another. Examples of such transposomes are known in the art (see, for instance, U.S. Patent Application Pub. No. 2010/0120098).
In one embodiment, tagmentation is used to produce target nucleic acids that include different universal sequences at each end. This can be accomplished by using two types of transposome complexes, where each transposome complex includes a different nucleotide sequence that is part of the transferred strand.
A population of target nucleic acids can have an average strand length that is desired or appropriate for a particular application of the methods, compositions, or kits set forth herein. For example, the average strand length can be less than about 100,000 nucleotides, 50,000 nucleotides, 10,000 nucleotides, 5,000 nucleotides, 1,000 nucleotides, 500 nucleotides, 100 nucleotides, or 50 nucleotides. Alternatively or additionally, the average strand length can be greater than about 10 nucleotides, 50 nucleotides, 100 nucleotides, 500 nucleotides, 1,000 nucleotides, 5,000 nucleotides, 10,000 nucleotides, 50,000 nucleotides, or 100,000 nucleotides. The average strand length for a population of target nucleic acids can be in a range between a maximum and minimum value set forth herein. It will be understood that amplicons generated at an amplification site (or
otherwise made or used herein) can have an average strand length that is in a range between an upper and lower limit selected from those exemplified above.
In some cases, a population of target nucleic acids can be produced under conditions or otherwise configured to have a maximum length for its members. For example, the maximum length for the members that are used in one or more steps of a method set forth herein or that are present in a particular composition can be less than 100,000 nucleotides, less than 50,000 nucleotides, less than 10,000 nucleotides, less than 5,000 nucleotides, less than 1,000 nucleotides, less than 500 nucleotides, less than 100 nucleotides, or less than 50 nucleotides. Alternatively or additionally, a population of target nucleic acids can be produced under conditions or otherwise configured to have a minimum length for its members. For example, the minimum length for the members that are used in one or more steps of a method set forth herein or that are present in a particular composition can be more than 10 nucleotides, more than 50 nucleotides, more than 100 nucleotides, more than 500 nucleotides, more than 1,000 nucleotides, more than 5,000 nucleotides, more than 10,000 nucleotides, more than 50,000 nucleotides, or more than 100,000 nucleotides. The maximum and minimum strand length for target nucleic acids in a population can be in a range between a maximum and minimum value set forth above. It will be understood that amplicons generated at an amplification site (or otherwise made or used herein) can have maximum and/or minimum strand lengths in a range between the upper and lower limits exemplified above.
In some embodiments, a sample can be enriched for sequences of interest, e.g., a predetermined sequence. For example, a subset of genes or regions of the genome are isolated and sequenced, or a subset of genes or regions of the genome are interrogated by other methods, such as a locus-specific in vitro diagnostic method. A predetermined sequence can be, for instance, one that can have a pattern of cytosine modification.
In some embodiments, target enrichment works by capturing genomic regions of interest by hybridization to target-specific probes that can be used to physically separate target DNA that has hybridized to bait probes from all other DNA in solution, which are then washed away. For example, some methods of enrichment use biotinylated probes, which are then isolated by magnetic pulldown with streptavidin-coated magnetic particles. In another example, some methods of enrichment use analyte arrays, also known as microarrays, that allow for the hybridization of predetermined sequences.
Enrichment can occur, for example, prior to treatment with engineered deaminase. Tn such embodiments, enriching a nucleic acid of interest, or a fragment thereof, such as enriching DNA in a sample, may include any suitable enrichment techniques. In some embodiments, enrichment of DNA may include enrichment through molecular inversion probes, in solution capture, pulldown probes, bait sets, standard PCR, multiplex PCR, hybrid capture, endonuclease digestion, DNase I hypersensitivity, and selective circularization. Enrichment can be achieved through negative selection of nucleic acids by eliminating undesired material. This sort of enrichment includes 'footprinting' techniques or 'subtractive' hybrid capture. During the former, the target sample is safe from nuclease activity through the protection of protein or by single and double stranded arrangements. During the latter, nucleic acids that bind ‘bait’ probes are eliminated.
In some embodiments, enriching can comprise amplification using target-specific primers. In some embodiments, amplification is performed subsequent to another form of enrichment. Typically, however, in embodiments where amplification is used for enrichment, the amplification step occurs after treatment with deaminase, to preserve methylation status of the target DNA. In some such embodiments, amplification can include PCR amplification or genome-wide amplification.
In some embodiments, enrichment can occur after treatment with an engineered deaminase. Typically, methods used to identity methylated cytosines result in the loss of DNA complexity due to conversion of unmethylated DNA bases to uracil, resulting in 3 -base genome and limits the use of sequences that specifically hybridize to a predetermined sequence. Accordingly, typical methods for identifying methylated cytosines are more difficult to use in methods that include enrichment, such as hybrid-enrichment sequencing and amplicon-based targeted sequencing, after conversion of methylated cytosines. In contrast, because of (i) the 5mC to T conversion by engineered deaminases and (ii) only a small percentage of cytosines are methylated and expected to be converted by an engineered deaminase. Examples of enrichment-based methods that can be used after treatment of a target nucleic acid with an engineered deaminase include but are not limited to analyte arrays, use of primers for selective amplification, CRISPR-Cas systems, and molecular cytogenic techniques such as FISH. Examples of arrays include, for instance, methylation arrays for interrogation of selected methylation sites across a genome (e.g., the Infinium Methyl ationEPIC BeadChip, Illumina).
Attachment of Universal Adapters
In some embodiments, a target nucleic acid used in a method, composition, or kit described herein can include a universal adapter attached to each end. A target nucleic acid having a universal adapter at each end can be referred to as a "modified target nucleic acid." Methods for attaching a universal adapter to each end of a target nucleic acid used in a method described herein are known to the person skilled in the art. The attachment can be through tagmentation using transposase complexes (WO 2016/130704), or through standard library preparation techniques using ligation (U.S. Pat. Pub. No. 2018/0305753). Attachment of a universal adapter to the ends of a target nucleic acid can occur before or after treatment of the target nucleic acid with an engineered deaminase.
In one embodiment, double-stranded target nucleic acids from a sample, e.g., a fragmented sample that has been contacted with an engineered deaminase and converted from single-stranded to double-stranded nucleic acids, are treated by first ligating identical universal adaptor molecules to the 5' and 3' ends of the double-stranded target nucleic acids. In one embodiment, the universal adapters are "matched" adapters or Y-adapters because the two strands of the adaptors are formed by annealing complementary polynucleotide strands. In one embodiment, the universal adapters used in the method of the disclosure are referred to as "mismatched" adaptors because the adaptors include a region of sequence mismatch, i.e., they are not formed by annealing fully complementary polynucleotide strands. The general features of mismatched adaptors are further described in Gormley et al., U.S. Pat. No. 7,741,463, and Bignell et al., U.S. Pat. No. 8,053,192,). The universal adaptor typically includes universal capture binding sequences that aid in immobilizing the target nucleic acids on an array for subsequent sequencing, and universal primer binding sites useful for the sequencing. In another embodiment, double-stranded target nucleic acids from a sample, a sample that has been contacted with an engineered deaminase and converted from single-stranded to double-stranded nucleic acids, are subjected to tagmentation with a transposome complex that inserts a universal adapter, or sequences that can be used to add a universal adapter, into a target nucleic acid.
A universal adapter can optionally include at least one index. An index can be used as a marker characteristic of the source of particular target nucleic acids on a flow cell (U.S. Pat. No. 8,053,192). Generally, the index is a synthetic sequence of nucleotides that is part of the universal adapter which is added to the target nucleic acids as part of the library preparation step. Accordingly, an index is a nucleic acid sequence which is attached to each of the target molecules
of a particular sample, the presence of which is indicative of, or is used to identify, the sample or source from which the target molecules were isolated.
Preferably an index may be up to 20 nucleotides in length, more preferably 1-10 nucleotides, and most preferably 4-6 nucleotides in length. A four nucleotide index gives a possibility of multiplexing 256 samples on the same array, a six base index enables 4096 samples to be processed on the same array.
The precise nucleotide sequence of the universal adapters is generally not material to the disclosure and may be selected by the user such that the desired sequence elements are ultimately included in the common sequences of the plurality of different modified target nucleic acids, for example, to provide for the universal capture binding sequences for immobilizing the target nucleic acids on an array for subsequent sequencing, and binding sites for particular sets of universal amplification primers and/or sequencing primers. Additional sequence elements may be included, for example, to provide binding sites for sequencing primers which will ultimately be used in sequencing of target nucleic acids in the library, sequencing of an index, or products derived from amplification of the target nucleic acids in the library, for example on a solid support.
In order to prepare a library of deaminase-treated DNA for analysis using a sequencing platform, it may be useful to make additional modifications to the target DNA, either prior to or after treatment with engineered deaminase. In some embodiments, single- stranded deaminase- treated DNA is prepared for sequencing using a single-stranded library preparation method, as is known in the art. Such methods include, but are not limited to, template switching based second strand synthesis, adapters containing a single-stranded splint overhang, and the like. Reagents for performing single-stranded library preparation methods are commercially available. Examples include xGen ssDNA & Low-Input DNA Library Prep Kit (Integrated DNA Technologies catalog number 10009859), previously sold as Accel -NGS (Swift Biosciences), NGS Single Stranded DNA Library Prep Kit (BioDynami catalog number 30082). Another example includes singlereaction single-stranded library (SRSLY) as set forth in Troll et al., BMC Genomics 20, 1023 (2019).
In some embodiments, library preparation modifications are made to double-stranded target DNA prior to treatment with engineered deaminase. Methods for library preparation of double-stranded DNA template are known in the art, and include Y-adaptor ligation, transposome- based tagmentation, and the like. It will be appreciated by those of skill in the art that methods of
double-strand library preparation often include one or more amplification steps using for example, PCR. In such methods, the amplification step may be deferred until after engineered deaminase treatment, to preserve the methylation status of the template strand. For example, in Y-adapter ligation methods, the Y-adapters can be ligated to the double-stranded template, after which the adapter-ligated template DNA is denatured and treated with engineered deaminase as described elsewhere herein. Following treatment with engineered deaminase, the resulting treated singlestrand DNA molecules can be amplified using PCR, bridge amplification, and other methods as are commonly known in the art.
Preparation of Immobilized Samples for Sequencing
The library of modified target nucleic acids, e.g., target nucleic acids having universal adapters at each end, can be prepared for sequencing. Methods for attaching modified target nucleic acids to a substrate are known in the art. In one embodiment, modified fragments are enriched using a plurality of capture oligonucleotides having specificity for the modified fragments, and the capture oligonucleotides can be immobilized on a surface of a solid substrate such as a flow cell or a bead. For instance, capture oligonucleotides can include a first member of a universal binding pair, and where a second member of the binding pair is immobilized on a surface of a solid substrate. Likewise, methods for amplifying immobilized target nucleic acids include, but are not limited to, bridge amplification and exclusion amplification (also referred to as kinetic exclusion amplification (KEA). Methods for immobilizing and amplifying prior to sequencing are described in, for instance, Bignell et al. (US 8,053,192), Gunderson et al. (WO2016/130704), Shen et al. (US 8,895,249), and Pipenburg et al. (US 9,309,502).
A pooled sample can be immobilized in preparation for sequencing. Sequencing can be performed as an array of single molecules or can be amplified prior to sequencing. The amplification can be carried out using one or more immobilized primers. The immobilized primer(s) can be, for instance, a lawn on a planar surface, or on a pool of beads. The pool of beads can be isolated into an emulsion with a single bead in each "compartment" of the emulsion. At a concentration of only one template per "compartment," only a single template is amplified on each bead.
The term "solid-phase amplification" as used herein refers to any nucleic acid amplification reaction carried out on or in association with a solid support such that all or a portion of the amplified products are immobilized on the solid support as they are formed. In particular, the term
encompasses solid-phase polymerase chain reaction (solid-phase PCR) and solid phase isothermal amplification which are reactions analogous to standard solution phase amplification, except that one or both of the forward and reverse amplification primers is/are immobilized on the solid support. Solid phase PCR covers systems such as emulsions, where one primer is anchored to a bead and the other is in free solution, and colony formation in solid phase gel matrices wherein one primer is anchored to the surface, and one is in free solution.
In some embodiments, the solid support comprises a patterned surface. A "patterned surface" refers to an arrangement of different regions in or on an exposed layer of a solid support. For example, one or more of the regions can be features where one or more amplification primers are present. The features can be separated by interstitial regions where amplification primers are not present. In some embodiments, the pattern can be an x-y format of features that are in rows and columns. In some embodiments, the pattern can be a repeating arrangement of features and/or interstitial regions. In some embodiments, the pattern can be a random arrangement of features and/or interstitial regions. Exemplary patterned surfaces that can be used in the methods and compositions set forth herein are described in U.S. Pat. Nos. 8,778,848, 8,778,849 and 9,079,148, and U.S. Pat. Appl. Pub. No. 2014/0243224.
In some embodiments, the solid support includes an array of wells or depressions in a surface. This may be fabricated as is generally known in the art using a variety of techniques, including, but not limited to, photolithography, stamping techniques, molding techniques and micro-etching techniques. As will be appreciated by those of skill in the art, the technique used will depend on the composition and shape of the array substrate.
The features in a patterned surface can be wells in an array of wells (e.g., microwells or nanowells) on glass, silicon, plastic or other suitable solid supports with patterned, covalently- linked gel such as poly(N-(5-azidoacetamidylpentyl)acrylamide-co-acrylamide) (PAZAM, see, for example, US Pub. No. 2013/184796, WO 2016/066586, and WO 2015/002813). The process creates gel pads used for sequencing that can be stable over sequencing runs with a large number of cycles. The covalent linking of the polymer to the wells is helpful for maintaining the gel in the structured features throughout the lifetime of the structured substrate during a variety of uses. However, in many embodiments the gel need not be covalently linked to the wells. For example, in some conditions silane free acrylamide (SFA, see, for example, US Pat. No. 8,563,477) which is not covalently attached to any part of the structured substrate, can be used as the gel material.
In particular embodiments, a structured substrate can be made by patterning a solid support material with wells (e.g., microwells or nanowells), coating the patterned support with a gel material (e.g., PAZAM, SFA, or chemically modified variants thereof, such as the azidolyzed version of SFA (azido-SFA)) and polishing the gel coated support, for example via chemical or mechanical polishing, thereby retaining gel in the wells but removing or inactivating substantially all of the gel from the interstitial regions on the surface of the structured substrate between the wells. Primer nucleic acids can be attached to gel material. A solution of modified target nucleic acids can then be contacted with the polished substrate such that individual modified target nucleic acids will seed individual wells via interactions with primers attached to the gel material; however, the target nucleic acids will not occupy the interstitial regions due to absence or inactivity of the gel material. Amplification of the modified target nucleic acids will be confined to the wells since absence or inactivity of gel in the interstitial regions prevents outward migration of the growing nucleic acid colony. The process can be conveniently manufactured, being scalable and utilizing conventional micro- or nanofabrication methods.
Although the disclosure encompasses "solid-phase" amplification methods in which only one amplification primer is immobilized (the other primer usually being present in free solution), in one embodiment the solid support is provided with both the forward and the reverse primers immobilized. In practice, there will be a plurality of identical forward primers and/or a plurality of identical reverse primers immobilized on the solid support, since the amplification process requires an excess of primers to sustain amplification. References herein to forward and reverse primers are to be interpreted accordingly as encompassing a plurality of such primers unless the context indicates otherwise.
As will be appreciated by the skilled reader, any given amplification reaction requires at least one type of forward primer and at least one type of reverse primer specific for the template to be amplified. However, in certain embodiments the forward and reverse primers may include template-specific portions of identical sequence, and may have entirely identical nucleotide sequence and structure (including any non-nucleotide modifications). In other words, it is possible to carry out solid-phase amplification using only one type of primer, and such single-primer methods are encompassed within the scope of the disclosure. Other embodiments may use forward and reverse primers which contain identical template-specific sequences but which differ in some
other structural features. For example, one type of primer may contain a non-nucleotide modification which is not present in the other.
Primers for solid-phase amplification are preferably immobilized by single point covalent attachment to the solid support at or near the 5' end of the primer, leaving the template-specific portion of the primer free to anneal to its cognate template and the 3' hydroxyl group free for primer extension. Any suitable covalent attachment means known in the art may be used for this purpose. The chosen attachment chemistry will depend on the nature of the solid support, and any derivatization or functionalization applied to it. The primer itself may include a moiety, which may be a non-nucleotide chemical modification, to facilitate attachment. In a particular embodiment, the primer may include a sulphur-containing nucleophile, such as phosphorothioate or thiophosphate, at the 5' end. In the case of solid-supported polyacrylamide hydrogels, this nucleophile will bind to a bromoacetamide group present in the hydrogel. A more particular means of attaching primers and templates to a solid support is via 5' phosphorothioate attachment to a hydrogel comprised of polymerized acrylamide and N-(5-bromoacetamidylpentyl) acrylamide (BRAPA), as described in Int. Pub. No. WO 05/065814.
Certain embodiments of the disclosure may make use of solid supports that include an inert substrate or matrix (e.g., glass slides, polymer beads, etc.) which has been "functionalized," for example by application of a layer or coating of an intermediate material including reactive groups which permit covalent attachment to biomolecules, such as polynucleotides. Examples of such supports include, but are not limited to, polyacrylamide hydrogels supported on an inert substrate such as glass. In such embodiments, the biomolecules (e.g., polynucleotides) may be directly covalently attached to the intermediate material (e.g., the hydrogel), but the intermediate material may itself be non-covalently attached to the substrate or matrix (e.g., the glass substrate). The term "covalent attachment to a solid support" is to be interpreted accordingly as encompassing this type of arrangement.
The pooled samples may be amplified on beads wherein each bead contains a forward and reverse amplification primer. In one embodiment, a library of modified target nucleic acids is used to prepare clustered arrays of nucleic acid colonies, analogous to those described in U.S. Pub. No. 2005/0100900, U.S. Pat. No. 7,115,400, WO 00/18957 and WO 98/44151 by solid-phase amplification and more particularly solid phase isothermal amplification. The terms "cluster" and "colony" are used interchangeably herein to refer to a discrete site on a solid support including a
plurality of identical immobilized nucleic acid strands and a plurality of identical immobilized complementary nucleic acid strands. The term "clustered array" refers to an array formed from such clusters or colonies. In this context, the term "array" is not to be understood as requiring an ordered arrangement of clusters.
The term "solid phase" or "surface" is used to mean either a planar array wherein primers are attached to a flat surface, for example, glass, silica or plastic microscope slides or similar flow cell devices; beads, wherein either one or two primers are attached to the beads and the beads are amplified; or an array of beads on a surface after the beads have been amplified.
Clustered arrays can be prepared using either a process of thermocycling, as described in WO 98/44151, or a process whereby the temperature is maintained as a constant, and the cycles of extension and denaturing are performed using changes of reagents. Such isothermal amplification methods are described in patent application numbers WO 02/46456 and U.S. Pub. No. 2008/0009420.
It will be appreciated that any of the amplification methodologies described herein or generally known in the art may be used with universal or target-specific primers to amplify immobilized DNA fragments. Suitable methods for amplification include, but are not limited to, the polymerase chain reaction (PCR), strand displacement amplification (SDA), transcription mediated amplification (TMA) and nucleic acid sequence-based amplification (NASBA), as described in U.S. Pat. No. 8,003,354. The above amplification methods may be employed to amplify one or more nucleic acids of interest. For example, PCR, including multiplex PCR, SDA, TMA, NASBA and the like may be utilized to amplify immobilized DNA fragments. In some embodiments, primers directed specifically to the polynucleotide of interest are included in the amplification reaction.
Other suitable methods for amplification of polynucleotides may include oligonucleotide extension and ligation, rolling circle amplification (RCA) (Lizardi et al., Nat. Genet. 19:225-232 (1998)) and oligonucleotide ligation assay (OLA) (See generally U.S. Pat. Nos. 7,582,420, 5,185,243, 5,679,524 and 5,573,907; EP 0 320 308 Bl; EP 0 336 731 Bl; EP 0 439 182 Bl; WO 90/01069; WO 89/12696; and WO 89/09835) technologies. It will be appreciated that these amplification methodologies may be designed to amplify immobilized DNA fragments. For example, in some embodiments, the amplification method may include ligation probe amplification or oligonucleotide ligation assay (OLA) reactions that contain primers directed
specifically to the nucleic acid of interest. In some embodiments, the amplification method may include a primer extension-ligation reaction that contains primers directed specifically to the nucleic acid of interest. As a non-limiting example of primer extension and ligation primers that may be specifically designed to amplify a nucleic acid of interest, the amplification may include primers used for the GoldenGate assay (Illumina, Inc., San Diego, CA) as exemplified by U. S. Pat. No. 7,582,420 and 7,611,869.
DNA nanoballs can also be used in combination with methods, systems, compositions and kits as described herein. Methods for creating and using DNA nanoballs for genomic sequencing can be found at, for example, US patents and publications U.S. Pat. No. 7,910,354, 2009/0264299, 2009/0011943, 2009/0005252, 2009/0155781, 2009/0118488 and as described in, for example, Drmanac et al. (2010, Science 327(5961): 78-81). Briefly, following production of modified target nucleic acids, the modified target nucleic acids are circularized and amplified by rolling circle amplification (Lizardi et al., 1998. Nat. Genet. 19:225-232; US 2007/0099208 Al). The extended concatemeric structure of the amplicons promotes coiling creates compact DNA nanoballs. The DNA nanoballs can be captured on substrates, preferably to create an ordered or patterned array such that distance between each nanoball is maintained thereby allowing sequencing of the separate DNA nanoballs. In some embodiments such as those used by Complete Genomics (Mountain View, Calif.), consecutive rounds of adapter addition, amplification, and digestion are carried out prior to circularization to produce head to tail constructs having several target nucleic acids separated by adapter sequences.
Exemplary isothermal amplification methods that may be used in a method of the present disclosure include, but are not limited to, Multiple Displacement Amplification (MDA) as exemplified by, for example Dean et al., Proc. Natl. Acad. Sci. USA 99:5261-66 (2002) or isothermal strand displacement nucleic acid amplification exemplified by, for example U.S. Pat. No. 6,214,587. Other non-PCR-based methods that may be used in the present disclosure include, for example, strand displacement amplification (SDA) which is described in, for example Walker et al., Molecular Methods for Virus Detection, Academic Press, Inc., 1995; U.S. Pat. Nos. 5,455,166, and 5,130,238, and Walker et al., Nucl. Acids Res. 20: 1691-96 (1992) or hyperbranched strand displacement amplification which is described in, for example Lage et al., Genome Res. 13:294-307 (2003). Isothermal amplification methods may be used with, for instance, the strand-displacing Phi 29 polymerase or Bst DNA polymerase large fragment, 5'->3'
exo- for random primer amplification of genomic DNA. The use of these polymerases takes advantage of their high processivity and strand displacing activity. High processivity allows the polymerases to produce fragments that are 10-20 kb in length. As set forth above, smaller fragments may be produced under isothermal conditions using polymerases having low processivity and strand-displacing activity such as Klenow polymerase. Additional description of amplification reactions, conditions and components are set forth in detail in the disclosure of U.S. Patent No. 7,670,810.
In some embodiments, isothermal amplification can be performed using kinetic exclusion amplification (KEA), also referred to as exclusion amplification (ExAmp). A nucleic acid library of the present disclosure can be made using a method that includes a step of reacting an amplification reagent to produce a plurality of amplification sites that each includes a substantially clonal population of amplicons from an individual target nucleic acid that has seeded the site. In some embodiments, the amplification reaction proceeds until a sufficient number of amplicons are generated to fill the capacity of the respective amplification site. Filling an already seeded site to capacity in this way inhibits target nucleic acids from landing and amplifying at the site thereby producing a clonal population of amplicons at the site. In some embodiments, apparent clonality can be achieved even if an amplification site is not filled to capacity prior to a second target nucleic acid arriving at the site. Under some conditions, amplification of a first target nucleic acid can proceed to a point that a sufficient number of copies are made to effectively outcompete or overwhelm production of copies from a second target nucleic acid that is transported to the site. For example, in an embodiment that uses a bridge amplification process on a circular feature that is smaller than 500 nm in diameter, it has been determined that after 14 cycles of exponential amplification for a first target nucleic acid, contamination from a second target nucleic acid at the same site will produce an insufficient number of contaminating amplicons to adversely impact sequencing-by-synthesis analysis on an Illumina sequencing platform.
In some embodiments, amplification sites in an array can be, but need not be, entirely clonal. Rather, for some applications, an individual amplification site can be predominantly populated with amplicons from a first modified target nucleic acid and can also have a low level of contaminating amplicons from a second modified target nucleic acid. An array can have one or more amplification sites that have a low level of contaminating amplicons so long as the level of contamination does not have an unacceptable impact on a subsequent use of the array. For
example, when the array is to be used in a detection application, an acceptable level of contamination would be a level that does not impact signal to noise or resolution of the detection technique in an unacceptable way. Accordingly, apparent clonality will generally be relevant to a particular use or application of an array made by the methods set forth herein. Exemplary levels of contamination that can be acceptable at an individual amplification site for particular applications include, but are not limited to, at most 0.1%, 0.5%, 1%, 5%, 10% or 25% contaminating amplicons. An array can include one or more amplification sites having these exemplary levels of contaminating amplicons. For example, up to 5%, 10%, 25%, 50%, 75%, or even 100% of the amplification sites in an array can have some contaminating amplicons. It will be understood that in an array or other collection of sites, at least 50%, 75%, 80%, 85%, 90%, 95% or 99% or more of the sites can be clonal or apparently clonal.
In some embodiments, kinetic exclusion can occur when a process occurs at a sufficiently rapid rate to effectively exclude another event or process from occurring. Take for example the making of a nucleic acid array where sites of the array are randomly seeded with modified target nucleic acids from a solution and copies of the modified target nucleic acids are generated in an amplification process to fill each of the seeded sites to capacity. In accordance with the kinetic exclusion methods of the present disclosure, the seeding and amplification processes can proceed simultaneously under conditions where the amplification rate exceeds the seeding rate. As such, the relatively rapid rate at which copies are made at a site that has been seeded by a first target nucleic acid will effectively exclude a second nucleic acid from seeding the site for amplification. Kinetic exclusion amplification methods can be performed as described in detail in the disclosure of U.S. Pat. Appl. Pub. No. 2013/0338042.
Kinetic exclusion can exploit a relatively slow rate for initiating amplification (e.g., a slow rate of making a first copy of a modified target nucleic acids) vs. a relatively rapid rate for making subsequent copies of the modified target nucleic acids (or of the first copy of the modified target nucleic acids). In the example of the previous paragraph, kinetic exclusion occurs due to the relatively slow rate of modified target nucleic acids seeding (e.g., relatively slow diffusion or transport) vs. the relatively rapid rate at which amplification occurs to fill the site with copies of the modified target nucleic acid seed. In another exemplary embodiment, kinetic exclusion can occur due to a delay in the formation of a first copy of a modified target nucleic acid that has seeded a site (e.g., delayed or slow activation) vs. the relatively rapid rate at which subsequent
copies are made to fill the site. In this example, an individual site may have been seeded with several different modified target nucleic acids (e.g., several modified target nucleic acids can be present at each site prior to amplification). However, first copy formation for any given modified target nucleic acid can be activated randomly such that the average rate of first copy formation is relatively slow compared to the rate at which subsequent copies are generated. In this case, although an individual site may have been seeded with several different modified target nucleic acids, kinetic exclusion will allow only one of those to be amplified. More specifically, once a first modified target nucleic acid has been activated for amplification, the site will rapidly fill to capacity with its copies, thereby preventing copies of a second modified target nucleic acid from being made at the site.
In one embodiment, the method is carried out to simultaneously (i) transport modified target nucleic acids to amplification sites at an average transport rate, and (ii) amplify the modified target nucleic acids that are at the amplification sites at an average amplification rate, wherein the average amplification rate exceeds the average transport rate (U.S. Pat. No. 9,169,513). Accordingly, kinetic exclusion can be achieved in such embodiments by using a relatively slow rate of transport. For example, a sufficiently low concentration of modified target nucleic acids can be selected to achieve a desired average transport rate, lower concentrations resulting in slower average rates of transport. Alternatively or additionally, a high viscosity solution and/or presence of molecular crowding reagents in the solution can be used to reduce transport rates. Examples of useful molecular crowding reagents include, but are not limited to, polyethylene glycol (PEG), ficoll, dextran, or polyvinyl alcohol. Exemplary molecular crowding reagents and formulations are set forth in U.S. Pat. No. 7,399,590, which is incorporated herein by reference. Another factor that can be adjusted to achieve a desired transport rate is the average size of the target nucleic acids.
An amplification reagent can include further components that facilitate amplicon formation, and in some cases increase the rate of amplicon formation. An example is a recombinase. Recombinase can facilitate amplicon formation by allowing repeated invasion/extension. More specifically, recombinase can facilitate invasion of a modified target nucleic acid by the polymerase and extension of a primer by the polymerase using the modified target nucleic acid as a template for amplicon formation. This process can be repeated as a chain reaction where amplicons produced from each round of invasion/extension serve as templates in a subsequent round. The process can occur more rapidly than standard PCR since a denaturation
cycle (e.g., via heating or chemical denaturation) is not required. As such, recombinase-facilitated amplification can be carried out isothermally. It is generally desirable to include ATP, or other nucleotides (or in some cases non-hydrolyzable analogs thereof) in a recombinase-facilitated amplification reagent to facilitate amplification. A mixture of recombinase and single-stranded binding (SSB) protein is particularly useful as SSB can further facilitate amplification. Exemplary formulations for recombinase-facilitated amplification include those sold commercially as TwistAmp kits by TwistDx (Cambridge, UK). Useful components of recombinase-facilitated amplification reagent and reaction conditions are set forth in US 5,223,414 and US 7,399,590.
Another example of a component that can be included in an amplification reagent to facilitate amplicon formation and in some cases to increase the rate of amplicon formation is a helicase. Helicase can facilitate amplicon formation by allowing a chain reaction of amplicon formation. The process can occur more rapidly than standard PCR since a denaturation cycle (e.g., via heating or chemical denaturation) is not required. As such, helicase-facilitated amplification can be carried out isothermally. A mixture of helicase and single-stranded binding (SSB) protein is particularly useful as SSB can further facilitate amplification. Exemplary formulations for helicase-facilitated amplification include those sold commercially as IsoAmp kits from Biohelix (Beverly, MA). Further, examples of useful formulations that include a helicase protein are described in US 7,399,590 and US 7,829,284.
Yet another example of a component that can be included in an amplification reagent to facilitate amplicon formation and in some cases increase the rate of amplicon formation is an origin binding protein.
Methods of Sequencing
Following attachment of modified target nucleic acids to a surface, the sequence of the immobilized and amplified modified target nucleic acids is determined. Sequencing can be carried out using any suitable sequencing technique, and methods for determining the sequence of immobilized and amplified modified target nucleic acids, including strand re-synthesis, are known in the art and are described in, for instance, Bignell et al. (US 8,053,192), Gunderson et al. (WO2016/130704), Shen et al. (US 8,895,249), and Pipenburg et al. (US 9,309,502).
The methods described herein can be used in conjunction with a variety of nucleic acid sequencing techniques. Particularly applicable techniques are those wherein nucleic acids are attached at fixed locations in an array such that their relative positions do not change and wherein
the array is repeatedly imaged. Embodiments in which images are obtained in different color channels, for example, coinciding with different labels used to distinguish one nucleotide base type from another are particularly applicable. In some embodiments, the process to determine the nucleotide sequence of a modified target nucleic acid can be an automated process. Preferred embodiments include sequencing-by-synthesis ("SBS") techniques.
SBS techniques generally involve the enzymatic extension of a nascent nucleic acid strand through the iterative addition of nucleotides against a template strand. In traditional methods of SBS, a single nucleotide monomer may be provided to a target nucleotide in the presence of a polymerase in each delivery. However, in the methods described herein, more than one type of nucleotide monomer can be provided to a target nucleic acid in the presence of a polymerase in a delivery.
In one embodiment, a nucleotide monomer includes locked nucleic acids (LNAs) or bridged nucleic acids (BNAs). The use of LNAs or BNAs in a nucleotide monomer increases hybridization strength between a nucleotide monomer and a sequencing primer sequence present on an immobilized modified modified target nucleic acid.
SBS can use nucleotide monomers that have a terminator moiety or those that lack any terminator moieties. Methods using nucleotide monomers lacking terminators include, for example, pyrosequencing and sequencing using y-phosphate-labeled nucleotides, as set forth in further detail herein. In methods using nucleotide monomers lacking terminators, the number of nucleotides added in each cycle is generally variable and dependent upon the template sequence and the mode of nucleotide delivery. For SBS techniques that use nucleotide monomers having a terminator moiety, the terminator can be effectively irreversible under the sequencing conditions used as is the case for traditional Sanger sequencing which utilizes dideoxynucleotides, or the terminator can be reversible as is the case for sequencing methods developed by Solexa (now Illumina, Inc ).
SBS techniques can use nucleotide monomers that have a label moiety or those that lack a label moiety. Accordingly, incorporation events can be detected based on a characteristic of the label, such as fluorescence of the label; a characteristic of the nucleotide monomer such as molecular weight or charge; a byproduct of incorporation of the nucleotide, such as release of pyrophosphate; or the like. In embodiments where two or more different nucleotides are present in a sequencing reagent, the different nucleotides can be distinguishable from each other, or
alternatively the two or more different labels can be the indistinguishable under the detection techniques being used. For example, the different nucleotides present in a sequencing reagent can have different labels and they can be distinguished using appropriate optics as exemplified by the sequencing methods developed by Solexa (now Illumina, Inc.).
Preferred embodiments include pyrosequencing techniques. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as particular nucleotides are incorporated into the nascent strand (Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlen, M. and Nyren, P. (1996) "Realtime DNA sequencing using detection of pyrophosphate release." Analytical Biochemistry 242(1), 84-9; Ronaghi, M. (2001) "Pyrosequencing sheds light on DNA sequencing." Genome Res. 11(1), 3-11; Ronaghi, M., Uhlen, M. and Nyren, P. (1998) "A sequencing method based on real-time pyrophosphate." Science 281(5375), 363; U.S. Pat. Nos. 6,210,891; 6,258,568 and 6,274,320). In pyrosequencing, released PPi can be detected by being immediately converted to adenosine triphosphate (ATP) by ATP sulfurase, and the level of ATP generated is detected via luciferase- produced photons. The nucleic acids to be sequenced can be attached to features in an array and the array can be imaged to capture the chemiluminescent signals that are produced due to incorporation of a nucleotides at the features of the array. An image can be obtained after the array is treated with a particular nucleotide type (e.g., A, T, C or G). Images obtained after addition of each nucleotide type will differ with regard to which features in the array are detected. These differences in the image reflect the different sequence content of the features on the array. However, the relative locations of each feature will remain unchanged in the images. The images can be stored, processed and analyzed using the methods set forth herein. For example, images obtained after treatment of the array with each different nucleotide type can be handled in the same way as exemplified herein for images obtained from different detection channels for reversible terminator-based sequencing methods.
In another exemplary type of SBS, cycle sequencing is accomplished by stepwise addition of reversible terminator nucleotides containing, for example, a cleavable or photobleachable dye label as described, for example, in WO 04/018497 and U.S. Pat. No. 7,057,026. This approach is being commercialized by Solexa (now Illumina Inc.), and is also described in WO 91/06678 and WO 07/123,744. The availability of fluorescently-labeled terminators in which both the termination can be reversed and the fluorescent label cleaved facilitates efficient cyclic reversible
termination (CRT) sequencing. Polymerases can also be co-engineered to efficiently incorporate and extend from these modified nucleotides.
In some reversible terminator-based sequencing embodiments, the labels do not substantially inhibit extension under SBS reaction conditions. However, the detection labels can be removable, for example, by cleavage or degradation. Images can be captured following incorporation of labels into arrayed nucleic acid features. In particular embodiments, each cycle involves simultaneous delivery of four different nucleotide types to the array and each nucleotide type has a spectrally distinct label. Four images can then be obtained, each using a detection channel that is selective for one of the four different labels. Alternatively, different nucleotide types can be added sequentially and an image of the array can be obtained between each addition step. In such embodiments, each image will show nucleic acid features that have incorporated nucleotides of a particular type. Different features will be present or absent in the different images due the different sequence content of each feature. However, the relative position of the features will remain unchanged in the images. Images obtained from such reversible terminator- SBS methods can be stored, processed and analyzed as set forth herein. Following the image capture step, labels can be removed and reversible terminator moieties can be removed for subsequent cycles of nucleotide addition and detection. Removal of the labels after they have been detected in a particular cycle and prior to a subsequent cycle can provide the advantage of reducing background signal and crosstalk between cycles. Examples of useful labels and removal methods are set forth herein.
In particular embodiments some or all of the nucleotide monomers can include reversible terminators. In such embodiments, reversible terminators/cleavable fluorophores can include fluorophores linked to the ribose moiety via a 3' ester linkage (Metzker, Genome Res. 15:1767- 1776 (2005)). Other approaches have separated the terminator chemistry from the cleavage of the fluorescence label (Ruparel et al., Proc Natl Acad Sci USA 102: 5932-7 (2005)). Ruparel et al. described the development of reversible terminators that used a small 3' allyl group to block extension, but could easily be deblocked by a short treatment with a palladium catalyst. The fluorophore was attached to the base via a photocleavable linker that could easily be cleaved by a 30 second exposure to long wavelength UV light. Thus, either disulfide reduction or photocleavage can be used as a cleavable linker. Another approach to reversible termination is the use of natural termination that ensues after placement of a bulky dye on a dNTP. The presence of a charged bulky
dye on the dNTP can act as an effective terminator through steric and/or electrostatic hindrance. The presence of one incorporation event prevents further incorporations unless the dye is removed. Cleavage of the dye removes the fluorophore and effectively reverses the termination. Examples of modified nucleotides are also described in U.S. Pat. Nos. 7,427,673, and 7,057,026.
Additional exemplary SBS systems and methods which can be used with the methods and systems described herein are described in U.S. Pub. Nos. 2007/0166705, 2006/0188901, 2006/0240439, 2006/0281109, 2012/0270305, and 2013/0260372, U.S. Pat. No. 7,057,026, PCT Publication No. WO 05/065814, U.S. Patent Application Publication No. 2005/0100900, and PCT Publication Nos. WO 06/064199 and WO 07/010,251.
Some embodiments can use detection of four different nucleotides using fewer than four different labels. For example, SBS can be performed using methods and systems described in the incorporated materials of U.S. Pub. No. 2013/0079232. As a first example, a pair of nucleotide types can be detected at the same wavelength, but distinguished based on a difference in intensity for one member of the pair compared to the other, or based on a change to one member of the pair (e.g., via chemical modification, photochemical modification or physical modification) that causes apparent signal to appear or disappear compared to the signal detected for the other member of the pair. As a second example, three of four different nucleotide types can be detected under particular conditions while a fourth nucleotide type lacks a label that is detectable under those conditions, or is minimally detected under those conditions (e.g., minimal detection due to background fluorescence, etc.). Incorporation of the first three nucleotide types into a nucleic acid can be determined based on presence of their respective signals and incorporation of the fourth nucleotide type into the nucleic acid can be determined based on absence or minimal detection of any signal. As a third example, one nucleotide type can include label(s) that are detected in two different channels, whereas other nucleotide types are detected In no more than one of the channels. The aforementioned three exemplary configurations are not considered mutually exclusive and can be used in various combinations. An exemplary embodiment that combines all three examples, is a fluorescent-based SBS method that uses a first nucleotide type that is detected in a first channel (e.g., dATP having a label that is detected in the first channel when excited by a first excitation wavelength), a second nucleotide type that is detected in a second channel (e.g., dCTP having a label that is detected in the second channel when excited by a second excitation wavelength), a third nucleotide type that is detected in both the first and the second channel (e.g., dTTP having at
least one label that is detected in both channels when excited by the first and/or second excitation wavelength) and a fourth nucleotide type that lacks a label that is not, or minimally, detected in either channel (e.g., dGTP having no label).
Further, as described in U.S. Pub. No. 2013/0079232, sequencing data can be obtained using a single channel. In such so-called one-dye sequencing approaches, the first nucleotide type is labeled but the label is removed after the first image is generated, and the second nucleotide type is labeled only after a first image is generated. The third nucleotide type retains its label in both the first and second images, and the fourth nucleotide type remains unlabeled in both images.
Some embodiments can use sequencing by ligation techniques. Such techniques use DNA ligase to incorporate oligonucleotides and identify the incorporation of such oligonucleotides. The oligonucleotides typically have different labels that are correlated with the identity of a particular nucleotide in a sequence to which the oligonucleotides hybridize. As with other SBS methods, images can be obtained following treatment of an array of nucleic acid features with the labeled sequencing reagents. Each image will show nucleic acid features that have incorporated labels of a particular type. Different features will be present or absent in the different images due the different sequence content of each feature, but the relative position of the features will remain unchanged in the images. Images obtained from ligation-based sequencing methods can be stored, processed and analyzed as set forth herein. Exemplary SBS systems and methods which can be utilized with the methods and systems described herein are described in U.S. Pat. Nos. 6,969,488, 6,172,218, and 6,306,597.
Some embodiments can use nanopore sequencing (Deamer, D. W. & Akeson, M. "Nanopores and nucleic acids: prospects for ultrarapid sequencing." Trends Biotechnol. 18, 147- 151 (2000); Deamer, D. and D. Branton, "Characterization of nucleic acids by nanopore analysis", Acc. Chem. Res. 35:817-825 (2002); Li, J., M. Gershow, D. Stein, E. Brandin, and J. A. Golovchenko, "DNA molecules and configurations in a solid-state nanopore microscope" Nat. Mater. 2:611-615 (2003)). In such embodiments, the modified target nucleic acid passes through a nanopore. The nanopore can be a synthetic pore or biological membrane protein, such as a- hemolysin. As the modified target nucleic acid passes through the nanopore, each base-pair can be identified by measuring fluctuations in the electrical conductance of the pore. (U.S. Pat. No. 7,001,792; Soni, G. V. & Meller, "A. Progress toward ultrafast DNA sequencing using solid-state nanopores." Clin. Chem. 53, 1996-2001 (2007); Healy, K. "Nanopore-based single-molecule DNA
analysis." Nanomed. 2, 459-481 (2007); Cockroft, S. L., Chu, J., Amorin, M & Ghadiri, M. R. "A single-molecule nanopore device detects DNA polymerase activity with single-nucleotide resolution." J. Am. Chem. Soc. 130, 818-820 (2008)). Data obtained from nanopore sequencing can be stored, processed and analyzed as set forth herein. In particular, the data can be treated as an image in accordance with the exemplary treatment of optical images and other images that is set forth herein.
Some embodiments can use methods involving the real-time monitoring of DNA polymerase activity. Nucleotide incorporations can be detected through fluorescence resonance energy transfer (FRET) interactions between a fluorophore-b earing polymerase and y-phosphate- labeled nucleotides as described, for example, in U.S. Pat. Nos. 7,329,492 and 7,211,414, or nucleotide incorporations can be detected with zero-mode waveguides as described, for example, in U.S. Pat. No. 7,315,019, and using fluorescent nucleotide analogs and engineered polymerases as described, for example, in U.S. Pat. No. 7,405,281 and U.S. Pub. No. 2008/0108082. The illumination can be restricted to a zeptoliter-scale volume around a surface-tethered polymerase such that incorporation of fluorescently labeled nucleotides can be observed with low background (Levene, M. J. et al. "Zero-mode waveguides for single-molecule analysis at high concentrations." Science 299, 682-686 (2003); Lundquist, P. M. et al. "Parallel confocal detection of single molecules in real time." Opt. Lett. 33, 1026-1028 (2008); Korlach, J. et al. "Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nano structures." Proc. Natl. Acad. Sci. USA 105, 1176-1181 (2008)). Images obtained from such methods can be stored, processed and analyzed as set forth herein.
Some SBS embodiments include detection of a proton released upon incorporation of a nucleotide into an extension product. For example, sequencing based on detection of released protons can use an electrical detector and associated techniques that are commercially available from Ion Torrent (Guilford, CT, a Life Technologies subsidiary) or sequencing methods and systems described in U.S. Pub. Nos. 2009/0026082; 2009/0127589; 2010/0137143; and 2010/0282617. Methods set forth herein for amplifying target nucleic acids using kinetic exclusion can be readily applied to substrates used for detecting protons. More specifically, methods set forth herein can be used to produce clonal populations of amplicons that are used to detect protons.
The above SBS methods can be advantageously carried out in multiplex formats such that multiple different modified target nucleic acids are manipulated simultaneously. In particular
embodiments, different modified target nucleic acids can be treated in a common reaction vessel or on a surface of a particular substrate. This allows convenient delivery of sequencing reagents, removal of unreacted reagents and detection of incorporation events in a multiplex manner. In embodiments using surface-bound target nucleic acids, the modified target nucleic acids can be in an array format. In an array format, the modified target nucleic acids can be typically bound to a surface in a spatially distinguishable manner. The modified target nucleic acids can be bound by direct covalent attachment, attachment to a bead or other particle or binding to a polymerase or other molecule that is attached to the surface. The array can include a single copy of a modified target nucleic acid at each site (also referred to as a feature) or multiple copies having the same sequence can be present at each site or feature. Multiple copies can be produced by amplification methods such as, bridge amplification or emulsion PCR as described in further detail herein.
The methods set forth herein can use arrays having features at any of a variety of densities including, for example, at least about 10 features/cm2, 100 features/ cm2, 500 features/ cm2, 1,000 features/ cm2, 5,000 features/ cm2, 10,000 features/ cm2, 50,000 features/ cm2, 100,000 features/ cm2, 1,000,000 features/ cm2, 5,000,000 features/ cm2, or higher.
An advantage of the methods set forth herein is that they provide for rapid and efficient detection of a plurality of cm2, in parallel. Accordingly, the present disclosure provides integrated systems capable of preparing and detecting nucleic acids using techniques known in the art such as those exemplified herein. Thus, an integrated system of the present disclosure can include fluidic components capable of delivering amplification reagents and/or sequencing reagents to one or more immobilized modified target nucleic acids, the system including components such as pumps, valves, reservoirs, fluidic lines and the like. A flow cell can be configured and/or used in an integrated system for detection of target nucleic acids. Exemplary flow cells are described, for example, in US Pat. No. 8,241,573 and US Pat. No. 8,951,781. As exemplified for flow cells, one or more of the fluidic components of an integrated system can be used for an amplification method and for a detection method. Taking a nucleic acid sequencing embodiment as an example, one or more of the fluidic components of an integrated system can be used for an amplification method set forth herein and for the delivery of sequencing reagents in a sequencing method such as those exemplified above. Alternatively, an integrated system can include separate fluidic systems to carry out amplification methods and to carry out detection methods. Examples of integrated sequencing systems that are capable of creating amplified nucleic acids and also determining the
sequence of the nucleic acids include, without limitation, the MiSeqTM platform (Illumina, Inc., San Diego, CA) and devices described in US Pat. No. 8,951,781.
While the embodiments presented herein are generally described using a sequencing platform (such as a sequencing by synthesis platform) as a readout, one of ordinary skill in the art will recognize that nucleic acids modified by the engineered deaminases presented herein can also be detected using any other suitable readout methodology. For example, the location and identity of modified cytosines can be assessed using a microarray. Any of a variety of analyte arrays (also referred to as “microarrays”) known in the art can be used in a method or system set forth herein. A typical array contains analytes, each having an individual probe or a population of probes. In the latter case, the population of probes at each analyte is typically homogenous having a single species of probe. For example, in the case of a nucleic acid array, each analyte can have multiple nucleic acid molecules each having a common sequence. However, in some implementations the populations at each analyte of an array can be heterogeneous. Similarly, protein arrays can have analytes with a single protein or a population of proteins typically, but not always, having the same amino acid sequence. The probes can be attached to the surface of an array for example, via covalent linkage of the probes to the surface or via non-covalent interaction(s) of the probes with the surface. In some implementations, probes, such as nucleic acid molecules, can be attached to a surface via a gel layer as described, for example, in U.S. patent application Ser. No. 13/784,368 and US Pat. App. Pub. No. 2011/0059865 Al.
Example arrays include, without limitation, a BeadChip Array available from Illumina, Inc. (San Diego, Calif.) or others such as those where probes are attached to beads that are present on a surface (e.g., beads in wells on a surface) such as those described in U.S. Pat. Nos. 6,266,459; 6,355,431; 6,770,441; 6,859,570; or 7,622,294; or PCT Publication No. WO 00/63437. Further examples of commercially available microarrays that can be used include, for example, an Affymetrix® GeneChip® microarray or other microarray synthesized in accordance with techniques sometimes referred to as VLSIPS™ (Very Large Scale Immobilized Polymer Synthesis) technologies. A spotted microarray can also be used in a method or system according to some implementations of the present disclosure. An example spotted microarray is a CodeLink™ Array available from Amersham Biosciences. Another microarray that is useful is one that is manufactured using inkjet printing methods such as SurePrint™ Technology available from Agilent Technologies.
In a specific embodiment, an engineered deaminase as presented herein can be used to convert 5-methyl cytosine (5mC) to thymidine (T) by deamination as described herein, such as by providing a sample of DNA suspected of including single-stranded DNA including at least one 5- methyl cytosine (5mC), at least one 5 -hydroxymethyl cytosine (5hmC), at least one 5-formyl cytosine (5fC), at least one 5-carboxy cytosine (5CaC), or a combination thereof; contacting the DNA with the engineered deaminase under conditions suitable for conversion of 5 methyl cytosine (5mC) to thymidine (T) by deamination at a greater rate than conversion of cytosine (C) to uracil (U) by deamination, to result in converted single-stranded DNA, wherein 5mC, 5hmC, 5fC, and/or 5CaC are converted to T.
In a specific embodiment, an engineered deaminase of the present disclosure can be used to detect 5hmC as described herein, such as by providing a sample of DNA suspected of including single-stranded DNA that has at least one 5 -hydroxymethyl cytosine (5hmC); contacting the DNA with the engineered deaminase under conditions suitable for conversion of unmodified cytosine to uracil and 5mC to thymidine and no detectable conversion of 5hmC to 5hmU.
The converted single- stranded DNA can then be processed as needed to facilitate hybridization to a microarray. For example, the converted DNA can be amplified. Any one of a number of amplification methods as are known in the art can be performed. For example, wholegenome amplification or amplification using universal primers that hybridize to a common region in the converted DNA, such as an adaptor sequence, can be used. Additionally or alternatively, the converted DNA can be fragmented. Fragmentation can be performed prior to or following amplification, or in the absence of amplification. Any one of a number of fragmentation methods as are known in the art can be performed. As one example, fragmentation can be performed using an enzymatic process, such as a restriction endonuclease or other enzyme capable of cleaving the converted DNA. As another example, fragmentation can be performed using mechanical means, such as shearing using, for example a sonication device such as those supplied by Covaris. The fragmented converted DNA can then be precipitated and/or resuspended in a buffer suitable for hybridization to a microarray. Following hybridization, the methylation state of regions of interest, such as a specific CpG locus or loci, can be interrogated at specific locations on the microarray. Methods of preparing converted DNA for microarray analysis are known in the art. One example of such methods is described in the Methylation Protocol Guide for the Infinium HD Assay from Illumina (San Diego, CA). Whereas such a protocol guide may describe use of a microarray
designed for interrogation of bisulfite-converted DNA, it will be understood that array features, specifically probe sequences, can be specifically designed for DNA that has not been bisulfite converted. As an example, a commercially available microarray such as the Infinium Methyl ationEPIC BeadChip (Illumina) is specifically designed to hybridize with DNA fragments with reduced complexity, as found in bisulfite converted DNA, where most if not all cytosines are converted to thymidine. Thus, for example, the same CpG sites can be interrogated in non- bisulfite-converted DNA by using a microarray including probes designed to hybridize to the same regions of native, non-bisulfite-converted DNA. One of skill in the art could readily obtain such a microarray. In one embodiment, a custom array could be designed using the manifest for an array such as the Infinium MethylationEPIC BeadChip, by using the “Forward Sequence” to identify a probe sequence including native DNA sequence that covers a similar or identical sequence region for the allele-specific probe sequences, which are designed to hybridize to DNA sequences where most or all cytosines have been converted to thymidine. Using such an array designed to hybridize to native (non-bisulfite-converted) DNA sequences, the methodologies and analysis methods described in the Methylation Protocol Guide for the Infinium HD Assay from Illumina (San Diego, CA) could be followed to identify methylated CpG sites in the sample DNA.
Compositions
The present disclosure also provides compositions that include an engineered deaminase described herein. The composition can include one or more additional other components in addition to the engineered deaminase. For example, the other component can include a singlestranded DNA or RNA substrate that includes, or is suspected of including, at least one modified cytosine, such as a 5-methyl cytosine, a 5 -hydroxymethyl cytosine, a 5-formyl cytosine (5fC), a 5- carboxy cytosine (5CaC), or a combination thereof. In another example, a single- stranded DNA or RNA substrate can be one including one or more known modified cytosine, e g., a single-stranded DNA or RNA substrate that can be used as a control to measure conversion efficiency. In another example, the other component can include a “buffer”, i.e., a solution that resists changes in pH when acid or alkali is added to it. Examples of suitable buffers include, without limitation, a citrate buffer, a sodium acetate buffer, or a Bis-Tris buffer. In another example, the other component can include a reductant, including but not limited to, DTT and/or TCEP, as well as Zn.
A composition can also include a polynucleotide encoding an engineered deaminase described herein. The polynucleotide can be present in a vector, such as a plasmid or viral vector. A vector that includes the polynucleotide can be present in a host cell, such as E. coli.
Kits
The present disclosure also provides kits for determining the methylation status of DNA or RNA. A kit includes at least one engineered deaminase described herein and one or more other components in a suitable packaging material in an amount sufficient for at least one reaction. Examples of other components include a positive control polynucleotide, such as a single-stranded DNA including one or more known modified cytosines for use in measuring conversion efficiency, or a negative control polynucleotide, such as a single-stranded DNA including unmodified cytosines. Another component can be a glucosyltransferase, such as T4-beta glucosyltransferase. Optionally, other reagents such as buffers and solutions needed to use the engineered deaminase and nucleotide solution are also included. Instructions for use of the packaged components are also typically included.
As used herein, the phrase "packaging material" refers to one or more physical structures used to house the contents of the kit. The packaging material is constructed by known methods, preferably to provide a sterile, contaminant-free environment. The packaging material has a label which indicates that the components can be used for determining the methylation status of DNA or RNA. In addition, the packaging material contains instructions indicating how the materials within the kit are employed to practice a reaction with an engineered deaminase. As used herein, the term "package" refers to a solid matrix or material such as glass, plastic, paper, foil, and the like, capable of holding within fixed limits the polypeptides. "Instructions for use" typically include a tangible expression describing the reagent concentration or at least one assay method parameter, such as the relative amounts of reagent and sample to be admixed, maintenance time periods for reagent/ sample admixtures, temperature, buffer conditions, and the like.
Definitions
Terms used herein will be understood to take on their ordinary meaning in the relevant art unless specified otherwise. Several terms used herein and their meanings are set forth below.
As used herein, the terms "organism," "subject," are used interchangeably and refer to microbes (e.g., prokaryotic or eukaryotic) animals and plants. An example of an animal is a mammal, such as a human.
As used herein, the term "target nucleic acid," is intended as a semantic identifier for the nucleic acid in the context of a method or composition or kit set forth herein and does not necessarily limit the structure or function of the nucleic acid beyond what is otherwise explicitly indicated. Reference to a nucleic acid such as a target nucleic acid includes both single-stranded and double-stranded nucleic acids, and both DNA and RNA, unless indicated otherwise. The term library refers to the collection of target nucleic acids containing known common sequences, such as a universal sequence or adapter, at their 3' and 5' ends.
As used herein, the term "adapter" and its derivatives, e.g., universal adapter, refers generally to any linear oligonucleotide which can be attached to a target nucleic acid. An adapter can be single-stranded or double-stranded DNA, or can include both double-stranded and singlestranded regions. An adapter can include a universal sequence that is substantially identical, or substantially complementary, to at least a portion of a primer, for example a universal primer; an index (also referred to herein as a barcode or tag) to assist with downstream error correction, identification, or sequencing; and/or a unique molecular identifier. In some embodiments, the adapter is substantially non-complementary to the 3' end or the 5' end of any target sequence present in the sample. In some embodiments, suitable adapter lengths are in the range of about 6- 100 nucleotides, about 12-60 nucleotides, or about 15-50 nucleotides in length. For instance, The terms "adaptor" and "adapter" are used interchangeably.
As used herein, the term "universal," when used to describe a nucleotide sequence, refers to a region of sequence that is common to two or more nucleic acid molecules where the molecules also have regions of sequence that differ from each other. A universal sequence that is present in different members of a collection of nucleic acids can be used as, for instance, a "landing pad" in a subsequent step to anneal a nucleotide sequence that can be used as a primer for addition of another nucleotide sequence, such as an index, to a target nucleic acid. A universal sequence that is present in different members of a collection of nucleic acids can allow capture of multiple different nucleic acids using a population of universal capture nucleic acids, e.g., capture oligonucleotides that are complementary to a portion of the universal sequence, e.g., a universal capture sequence. Non-limiting examples of universal capture sequences include sequences that are identical to or complementary to P5 and P7 primers. Similarly, a universal sequence present in different members of a collection of molecules can allow the replication (e.g., sequencing) or amplification of multiple different nucleic acids using a population of universal primers that are
complementary to a portion of the universal sequence, e.g., a universal anchor sequence. In one embodiment universal anchor sequences are used as a site to which a universal primer (e.g., a sequencing primer for read 1 or read 2) anneals for sequencing. A capture oligonucleotide or a universal primer therefore includes a sequence that can hybridize specifically to a universal sequence.
The terms "P5" and "P7" may be used when referring to a universal capture sequence or a capture oligonucleotide. The terms "P5' " (P5 prime) and "P71 " (P7 prime) refer to the complement of P5 and P7, respectively. It will be understood that any suitable universal capture sequence or a capture oligonucleotide can be used in the methods presented herein, and that the use of P5 and P7 are exemplary embodiments only. Uses of capture oligonucleotides such as P5 and P7 or their complements on flow cells are known in the art, as exemplified by the disclosures of WO 2007/010251, WO 2006/064199, WO 2005/065814, WO 2015/106941, WO 1998/044151, and WO 2000/018957, which are incorporated by reference as to P5 and P7 and their uses. For example, any suitable forward amplification primer, whether immobilized or in solution, can be useful in the methods presented herein for hybridization to a complementary sequence and amplification of a sequence. Similarly, any suitable reverse amplification primer, whether immobilized or in solution, can be useful in the methods presented herein for hybridization to a complementary sequence and amplification of a sequence. One of skill in the art will understand how to design and use primer sequences that are suitable for capture and/or amplification of nucleic acids as presented herein.
As used herein, the term "primer" and its derivatives refer generally to any nucleic acid that can hybridize to a target sequence of interest. Typically, the primer functions as a substrate onto which nucleotides can be polymerized by a polymerase or to which a polynucleotide can be ligated; in some embodiments, however, the primer can become incorporated into the synthesized nucleic acid strand and provide a site to which another primer can hybridize to prime synthesis of a new strand that is complementary to the synthesized nucleic acid molecule. In some embodiments, the primer can be used for hybridization to a predetermined sequence, for instance a predetermined sequence that includes one or more nucleotides that identify the location of a modified cytosine. In one embodiment, a “primer” includes a sequence present in a guide RNA used with a CRISPR-based system to hybridize to a predetermined sequence. The primer can
include any combination of nucleotides or analogs thereof. In some embodiments, the primer is a single-stranded oligonucleotide or polynucleotide.
The terms "polynucleotide" and "oligonucleotide" and “nucleic acid” are used interchangeably herein to refer to a polymeric form of nucleotides of any length, and may include ribonucleotides, deoxyribonucleotides, analogs thereof, or mixtures thereof. The terms should be understood to include, as equivalents, analogs of either DNA, RNA, cDNA, or antibody-oligo conjugates made from nucleotide analogs and to be applicable to single stranded (such as sense or antisense) and double stranded polynucleotides. The term as used herein also encompasses cDNA, that is complementary or copy DNA produced from a RNA template, for example by the action of reverse transcriptase.
As used herein, an "index" (also referred to as an "index region," "index adaptor," "tag," or a "barcode") refers to a unique nucleic acid tag that can be used to identify a sample or source of the nucleic acid material, or a compartment in which a target nucleic acid was present. The index can be present in solution or on a solid-support, or attached to or associated with a solid-support and released in solution or compartment. When nucleic acid samples are derived from multiple sources, the nucleic acids in each nucleic acid sample can be tagged with different nucleic acid tags such that the source of the sample can be identified. Any suitable index or set of indexes can be used, as known in the art and as exemplified by the disclosures of U.S. Pat. No. 8,053,192, PCT Publication No. WO 05/068656, and U.S. Pat. Publication No. 2013/0274117. In some embodiments, an index can include a six-base Index 1 (i7) sequence, an eight-base Index 1 (i7) sequence, an eight-base Index 2 (i5e) sequence, a ten-base Index 1 (i7) sequence, or a ten-base Index 2 (i5) sequence from Illumina, Inc. (San Diego, CA).
As used herein, the term "amplicon," when used in reference to a nucleic acid, means the product of copying the nucleic acid, wherein the product has a nucleotide sequence that is the same as or complementary to at least a portion of the nucleotide sequence of the nucleic acid. An amplicon can be produced by any of a variety of amplification methods that use the nucleic acid, or an amplicon thereof, as a template including, for example, polymerase extension, polymerase chain reaction (PCR), rolling circle amplification (RCA), ligation extension, or ligation chain reaction. An amplicon can be a nucleic acid molecule having a single copy of a particular nucleotide sequence (e.g., a PCR product) or multiple copies of the nucleotide sequence (e.g., a concatameric product of RCA). A first amplicon of a target nucleic acid is typically a
complementary copy. Subsequent amplicons are copies that are created, after generation of the first amplicon, from the target nucleic acid or from the first amplicon. A subsequent amplicon can have a sequence that is substantially complementary to the target nucleic acid or substantially identical to the target nucleic acid.
As used herein, "amplify", "amplifying" or "amplification reaction" and their derivatives, refer generally to any action or process whereby at least a portion of a nucleic acid molecule is replicated or copied into at least one additional nucleic acid molecule. The additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the template nucleic acid molecule. The template nucleic acid molecule can be single-stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or double-stranded. Amplification is typically the exponential replication of a nucleic acid molecule. In some embodiments, such amplification can be performed using isothermal conditions; in other embodiments, such amplification can include thermocycling. In some embodiments, the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction. In some embodiments, "amplification" includes amplification of at least some portion of DNA and RNA based nucleic acids alone, or in combination. The amplification reaction can include any of the amplification processes known to one of ordinary skill in the art. In some embodiments, the amplification reaction includes polymerase chain reaction (PCR).
As used herein, the term "polymerase chain reaction" ("PCR") refers to the method of Mullis U.S. Pat. Nos. 4,683,195 and 4,683,202, which describe a method for increasing the concentration of a segment of a polynucleotide of interest in a mixture of genomic DNA without cloning or purification. This process for amplifying the polynucleotide of interest consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired polynucleotide of interest, followed by a series of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double stranded polynucleotide of interest. The mixture is denatured at a higher temperature first and the primers are then annealed to complementary sequences within the polynucleotide of interest molecule. Following annealing, the primers are extended with a polymerase to form a new pair of complementary strands. The steps of denaturation, primer annealing and polymerase extension can be repeated many times (referred to as thermocycling) to obtain a high concentration of an
amplified segment of the desired polynucleotide of interest. The length of the amplified segment of the desired polynucleotide of interest (amplicon) is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of repeating the process, the method is referred to as PCR. Because the desired amplified segments of the polynucleotide of interest become the predominant nucleic acid sequences (in terms of concentration) in the mixture, they are said to be "PCR amplified". In a modification to the method discussed above, the target nucleic acid molecules can be PCR amplified using a plurality of different primer pairs, in some cases, one or more primer pairs per target nucleic acid molecule of interest, thereby forming a multiplex PCR reaction.
As used herein, "amplification conditions" and its derivatives, generally refers to conditions suitable for amplifying one or more nucleic acid sequences. In some embodiments, the amplification conditions can include isothermal conditions or alternatively can include thermocycling conditions, or a combination of isothermal and thermocycling conditions. In some embodiments, the conditions suitable for amplifying one or more nucleic acid sequences include polymerase chain reaction (PCR) conditions. Typically, the amplification conditions refer to a reaction mixture that is sufficient to amplify nucleic acids such as one or more target sequences flanked by a universal sequence, or target specific primers, or to amplify an amplified target sequence flanked by one or more adapters. Generally, the amplification conditions include a catalyst for amplification or for nucleic acid synthesis, for example a polymerase; a primer that possesses some degree of complementarity to the nucleic acid to be amplified; and nucleotides, such as deoxyribonucleotide triphosphates (dNTPs) to promote extension of the primer once hybridized to the nucleic acid. The amplification conditions can require hybridization or annealing of a primer to a nucleic acid, extension of the primer and a denaturing step in which the extended primer is separated from the nucleic acid sequence undergoing amplification. Typically, but not necessarily, amplification conditions can include thermocycling; in some embodiments, amplification conditions include a plurality of cycles where the steps of annealing, extending and separating are repeated. Typically, the amplification conditions include cations such as Mg2+ or Mn2+ and can also include various modifiers of ionic strength.
As defined herein "multiplex amplification" refers to selective and non-random amplification of two or more target sequences within a sample using at least one target-specific primer. In some embodiments, multiplex amplification is performed such that some or all of the
target sequences are amplified within a single reaction vessel. The "plexy" or "plex" of a given multiplex amplification refers generally to the number of different target-specific sequences that are amplified during that single multiplex amplification. In some embodiments, the plexy can be about 12-plex, 24-plex, 48-plex, 96-plex, 192-plex, 384-plex, 768-plex, 1536-plex, 3072-plex, 6144-plex or higher. It is also possible to detect the amplified target sequences by several different methodologies (e.g., gel electrophoresis followed by densitometry, quantitation with a bioanalyzer or quantitative PCR, hybridization with a labeled probe; incorporation of biotinylated primers followed by avi din-enzyme conjugate detection; incorporation of 32P-labeled deoxynucleotide triphosphates into the amplified target sequence).
As used herein, the term "amplification site" refers to a site in or on an array where one or more amplicons can be generated. An amplification site can be further configured to contain, hold or attach at least one amplicon that is generated at the site.
As used herein, the term "array," "analyte array," and "microarray" are used interchangeably and refer to a population of sites that can be differentiated from each other according to relative location. Different molecules that are at different sites of an array can be differentiated from each other according to the locations of the sites in the array. An individual site of an array can include one or more molecules of a particular type. For example, a site can include a single target nucleic acid molecule having a particular sequence or a site can include several nucleic acid molecules having the same sequence (and/or complementary sequence, thereof). The sites of an array can be different features located on the same substrate. Exemplary features include without limitation, droplets, wells in a substrate, beads (or other particles) in or on a substrate, projections from a substrate, ridges on a substrate or channels in a substrate. The sites of an array can be separate substrates each bearing a different molecule. Different molecules attached to separate substrates can be identified according to the locations of the substrates on a surface to which the substrates are associated or according to the locations of the substrates in a liquid or gel. Exemplary arrays in which separate substrates are located on a surface include, without limitation, those having beads in wells.
As used herein, the term "compartment" is intended to mean an area or volume that separates or isolates something from other things. Exemplary compartments include, but are not limited to, vials, tubes, wells, droplets, boluses, beads, vessels, surface features, flow cell, or areas or volumes separated by physical forces such as fluid flow, magnetism, electrical current or the
like. In one embodiment, a compartment is a well of a multi-well plate, such as a 96- or 384-well plate. As used herein, a droplet may include a hydrogel bead, which is a bead for encapsulating one or more nuclei or cell, and includes a hydrogel composition. In some embodiments, the droplet is a homogeneous droplet of hydrogel material or is a hollow droplet having a polymer hydrogel shell. Whether homogenous or hollow, a droplet may be capable of encapsulating one or more nuclei or cells. In some embodiments, the droplet is a surfactant stabilized droplet. In some embodiments, a single cell or Nuclei is present per compartment. In some embodiments, two or more cells or Nuclei are present per compartment. In some embodiments, each compartment contains a compartment-specific index. In some embodiments, the index is in solution or attached or associated with a solid-phase in each compartment.
The term "flow cell" as used herein refers to a chamber comprising a solid surface across which one or more fluid reagents can be flowed. Examples of flow cells and related fluidic systems and detection platforms that can be readily used in the methods of the present disclosure are described, for example, in Bentley et al., Nature 456:53-59 (2008), WO 04/018497; US 7,057,026; WO 91/06678; WO 07/123744; US 7,329,492; US 7,211,414; US 7,315,019; US 7,405,281, and US 2008/0108082.
As used herein, the term "clonal population" refers to a population of nucleic acids that is homogeneous with respect to a particular nucleotide sequence. The homogenous sequence is typically at least 10 nucleotides long, but can be even longer including for example, at least 50, 100, 250, 500 or 1000 nucleotides long. A clonal population can be derived from a single target nucleic acid or template nucleic acid. Typically, all of the nucleic acids in a clonal population will have the same nucleotide sequence. It will be understood that a small number of mutations (e.g., due to amplification artifacts) can occur in a clonal population without departing from clonality.
As used herein, a "pattern of cytosine modification," also referred to as a "methylation profile," refers to the pattern with which both methylation and unmethylation of cysteines is distributed in the genome of a cell or an organism. A “pattern” is inclusive of both modified cytosines and non-modified cytosines. The pattern can be defined in several distribution dimensions: by organ, by tissue, by status of disease or pathological condition (e.g., cancer, neurophysiological), by genome segment (e.g., chromosome or genetic coordinates on a chromosome), by gene, by CpG island, a group of cytosines, or by the site of a modified cytosine. A pattern of cytosine modification can have a known correlation with a disease or pathological
condition, or correlation of a pattern of cytosine modification with a disease or pathological condition can be identified using methods described herein. A pattern of cytosine modification can be present at a specific locus (e.g., location) in a genome, and that specific location can be a single modified cytosine or a set of modified cytosines, e.g., a CpG island. A pattern of cytosine modification can be identified by using a predetermined sequence, e.g., a method of using an engineered deaminase can be designed and practiced with the intent of determining a pattern of cytosine modification, for instance, the methylation status of one of more specific cytosines, the methylation status of one or more specific cytosines present at a specific location of a genome, or the combination thereof.
As used herein, the term "each," when used in reference to a collection of items, is intended to identify an individual item in the collection but does not necessarily refer to every item in the collection unless the context clearly dictates otherwise.
As used in this specification and the appended claims, the term "or" is generally employed in its sense including "and/or" unless the content clearly dictates otherwise. The term "and/or" means one or all of the listed elements or a combination of any two or more of the listed elements. The use of "and/or" in some instances does not imply that the use of "or" in other instances may not mean "and/or."
Unless otherwise specified, "a," "an," "the," and "at least one" are used interchangeably and mean one or more than one.
The words "preferred" and "preferably" refer to embodiments of the disclosure that may afford certain benefits, under certain circumstances. However, other embodiments may also be preferred, under the same or other circumstances. Furthermore, the recitation of one or more preferred embodiments does not imply that other embodiments are not useful, and is not intended to exclude other embodiments from the scope of the disclosure.
As used herein, "have," "has," "having," "include," "includes," "including," "comprise," "comprises," "comprising" or the like are used in their open ended inclusive sense, and generally mean "include, but not limited to," "includes, but not limited to," or "including, but not limited to." It is understood that wherever embodiments are described herein with the language "have," "has," "having," "include," "includes," "including," "comprise," "comprises," "comprising" and the like, otherwise analogous embodiments described in terms of "consisting of' and/or "consisting essentially of' are also provided. The term "consisting of' means including, and limited to,
whatever follows the phrase "consisting of." That is, "consisting of indicates that the listed elements are required or mandatory, and that no other elements may be present. The term "consisting essentially of indicates that any elements listed after the phrase are included, and that other elements than those listed may be included provided that those elements do not interfere with or contribute to the activity or action specified in the disclosure for the listed elements.
Conditions that are "suitable" for an event to occur, such as converting 5 methyl cytosine to thymidine by deamination, or "suitable" conditions are conditions that do not prevent such events from occurring. Thus, these conditions permit, enhance, facilitate, and/or are conducive to the event.
As used herein, "providing" in the context of a protein, sample of DNA or RNA, or composition means making the protein, sample of DNA or RNA, or composition, purchasing the protein, sample of DNA or RNA, or composition, or otherwise obtaining the protein, sample of DNA or RNA, or composition.
Reference throughout this specification to "one embodiment," "an embodiment," "certain embodiments," or "some embodiments," etc., means that a particular feature, configuration, composition, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Thus, the appearances of such phrases in various places throughout this specification are not necessarily referring to the same embodiment of the disclosure. Furthermore, the particular features, configurations, compositions, or characteristics may be combined in any suitable manner in one or more embodiments.
While polynucleotide sequences encoding an engineered deaminase are described herein as DNA sequences, it is understood that the complements, reverse sequences, and reverse complements of the DNA sequences can be easily determined by the skilled person. It is also understood that the sequences described herein as DNA sequences can be converted from a DNA sequence to an RNA sequence by replacing each thymidine nucleotide with a uracil nucleotide.
The present disclosure is not limited to the specific details of construction, arrangement of components, or method steps set forth herein. The compositions and methods disclosed herein are capable of being made, practiced, used, carried out and/or formed in various ways that will be apparent to one of skill in the art in light of the disclosure that follows. The phraseology and terminology used herein is for the purpose of description only and should not be regarded as limiting to the scope of the claims. Ordinal indicators, such as first, second, and third, as used in
the description and the claims to refer to various structures or method steps, are not meant to be construed to indicate any specific structures or steps, or any particular order or configuration to such structures or steps. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context.
Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. For example, if a concentration range is stated as 1% to 50%, it is intended that values such as 2% to 40%, 10% to 30%, or 1% to 3%, etc., are expressly enumerated in this specification. These are only examples of what is specifically intended, and all possible combinations of numerical values between and including the lowest value and the highest value enumerated are to be considered to be expressly stated in this disclosure.
Use of the word “about” to describe a particular recited amount or range of amounts is meant to indicate that values very near to the recited amount are included in that amount, such as values that could or naturally would be accounted for due to manufacturing tolerances, instrument, and human error in forming measurements, and the like. All percentages referring to amounts are by weight unless indicated otherwise.
No admission is made that any reference, including any non-patent or patent document cited in this specification, constitutes prior art. In particular, it will be understood that, unless otherwise stated, reference to any document herein does not constitute an admission that any of these documents forms part of the common general knowledge in the art in the United States or in any other country. Any discussion of the references states what their authors assert, and the applicant reserves the right to challenge the accuracy and pertinence of any of the documents cited herein. All references cited herein are fully incorporated by reference, unless explicitly indicated otherwise. The present disclosure shall control in the event there are any disparities between any definitions and/or description found in the cited references.
The following examples are meant only to be illustrative and are not meant as limitations on the scope of the invention or of the appended claims.
EXAMPLES
Example 1:
In the following example, the inventors describe four experiments that demonstrate that a modified tRNA deaminase possesses single-stranded DNA (ssDNA) cytidine deaminase activity. Experiment 1 - USER cleavage assay
A long (7,249-nucleotide), circular ssDNA substrate (Ml 3 phage ssDNA, New England BioLabs #N4040S) was incubated with met7 at 70°C for 4 hours, treated with the enzyme USER to cleave uracil-containing DNA, and resolved on a TBE-urea gel. The ssDNA substrate was cleaved in the presence of met7 and USER in this assay, which indicates that met7-mediated DNA deamination occurred (FIG. 2A).
Experiment 2 - NGS assay
A ssDNA substrate containing 17 unmethylated cytosine (C) and 16 methylated cytosine (mC) (FIG. 2C) was incubated with met7 at 70°C for 16 hours. The oligo was then PCR-amplified, indexed, and sequenced on an Illumina MiniSeq (FIG. 2B). C and mC deamination events were quantified as C to T mutations at each site. A catalytically inactive met7 mutant (E46A) was used as a negative control. met7 showed significantly higher deamination across all sites in the ssDNA oligo as compared to the E46A mutant (FIG. 2D), demonstrating that met7 is a bona fide ssDNA cytidine deaminase.
Experiment 3 - Swal assay
A ssDNA substrate comprising a single cytosine (FIG. 3B), deamination of which results in formation of a Swal restriction site, was incubated with met7 and then subsequently incubated with the restriction enzyme Swal (FIG. 3A). Met7 residues that are influential for its deaminase activity (i.e., C67A, C70A, and E46A) were identified via comparison of structural models of mef7 and AP0BEC3A (FIG. 3C), and met7 variants comprising mutations in these residues were generated for use as catalytically inactive negative control enzymes. Incubation with met7 resulted in the formation of a smaller, cleaved DNA product, which was resolved via denaturing gel electrophoresis, whereas incubation with the catalytically inactive met7 mutants did not (FIG. 3D). Preincubating wild-type met7 at 70°C before mixing it with the ssDNA oligo substrate did not impair its deaminase activity. In contrast, the deaminase activity of AP0BEC3A (A3A), a mesophilic enzyme, was abrogated upon pre-incubation at 70°C. Thus, these data demonstrate that met7 maintains its ssDNA deaminase activity following exposure to high temperature.
Experiment 4 - Swal assay with oxidized methylcytosine derivates
A similar Swal assay was performed using a ssDNA substrate containing C, mC, hydroxymethylcytosine (hmC), formylcytosine (fC), or carboxylcytosine (caC). Met7 showed considerable activity towards C, mC, and hmC, but not fC or caC (FIG. 4). This assay was performed at both 70°C and 37°C. Met7 was active at 70°C but not 37°C, whereas AP0BEC3A was active at 37°C but not 70°C (FIG. 4). Further, in these assays, a complementary oligo was mixed with the ssDNA substrate prior to enzyme addition to render the substrate double-stranded. This should have prevented met7 from binding to the substrate and deaminating it. However, met7 showed robust deamination activity at 70°C, likely because this high temperature denatures the double-stranded substrate DNA, allowing it to be accessed by mef7. This assay was repeated for met7 at 80°C, and robust deamination was observed (FIG. 5), confirming that met7 is active at high reaction temperatures.
Materials and Methods: met7 expression and purification
150 pL of overnight saturated Y130A/Y132H BL21 culture was inoculated into 15 mL Luria-Bertani (LB) media supplemented with kanamycin (50 pg/mL) and incubated at 37°C and 250 rpm to an OD600 of -0.4-0.6. Protein expression was induced via addition of 0.5 mM isopropyl P-d-1 -thiogalactopyranoside (IPTG). The cells were incubated for 3 hours at 37°C and 250 rpm and harvested via centrifugation. Protein purification was performed at 4°C. Briefly, cell pellets were resuspended with 1 mL lysis buffer (50 mM Tris pH 7.5, 500 mM NaCl, 5% (v/v) glycerol, and 1 mM DTT, 10 mM imidazole, IX GoldBio Protease Inhibitor Cocktail) and cells were disrupted via sonication on ice (5 seconds on at 90% amplitude, 30 seconds off, total sonication time 90 seconds). Then, the lysate was clarified by centrifugation and filtration. Next, 800 pL HisPur Ni-NTA resin (Thermo 88222) slurry (pre-washed with lysis buffer without GoldBio Protease Inhibitor Cocktail) was added to the crude cell lysate and incubated for 1 hour at 4°C. The resin was then washed three times with 500 pL wash buffer (50 mM Tris pH 7.5, 500 mM NaCl, 5% (v/v) glycerol, 40 mM imidazole, 1 mM DTT). 100 pL elution buffer (50 mM Tris pH 7.5, 500 mM NaCl, 5% (v/v) glycerol, 250 mM imidazole) was incubated with the resin for 1 minute on ice and then spun down. Eluate was collected as flowthrough. Buffer exchange was performed to concentrate the protein to a final volume of -50 pL. This was done by topping up to -500 pL with storage buffer (20 mM Tris pH 7.5, 200 mM NaCl, 5% (v/v) glycerol, 0.01% (v/v)
Tween-20, 1 mM DTT) twice upon concentration to ~50 pL. The final protein purity and concentration was determined by SDS-PAGE and NanoDrop. met7 variant lysate preparation
BL21(DE3) cells were transformed with expression vectors encoding the met7 variants, induced at 16°C overnight, centrifuged, and lysed via incubation at room temperature for 15 minutes in IxBugBuster. Cell lysates were normalized to an equivalent of OD = 0.3 and incubated with 10 nM ssDNA substrate at 70°C for 6 hours in a 10 pl reaction in glycylglycine pH 8.0 buffer. USER cleavage assay
8.7 pM of recombinantly expressed and purified met7 protein was incubated with 100 ng M13 DNA (NEB #N4040S, 7,249 nucleotide circular ssDNA) in lx Citrate Buffer (Thermo #005000) and 0.1 mg/ml NEB Monarch RNAse A (NEB #T3018-2) at 70°C for 4 hours in a 10 pl reaction. Subsequently, 1.13 pl lOx CutSmart Buffer was added to the sample. 5 pl of the reaction was moved to a fresh tube, mixed with 0.75 pl USER enzyme (NEB #M5505L), and incubated at 37°C for 1 hour to cleave uracil-containing DNA. Finally, 1 pl proteinase K (NEB #P8107S) was added and the reaction was incubated at 40°C for 1 hour to digest bound proteins. Sample DNA was then mixed with an equal volume of Novex 2x TBE-urea sample buffer and resolved on a 6% Novex TBE-urea gel. The gel was subsequently stained with SYBR Gold and imaged on a ChemiDoc MP instrument (Bio-Rad).
NGS assay
8.7 pM recombinantly expressed and purified met7 protein was incubated with 10 nM of ssDNA substrate oligo at 70°C for 16 hours in lx Citrate Buffer (Thermo #005000) in a 10 pl deamination reaction. The deamination reactions were then diluted with 90 pl of 10 mM Tris-HCl pH 8. Subsequently, 1 pl of diluted deamination reaction was amplified using PCR primers that add adapter and barcode sequences. The resulting sequencing libraries were sequenced using an Illumina MiniSeq instrument. PCR reactions were performed using Q5U® Hot Start High-Fidelity DNA Polymerase (New England Biolabs #M0515S).
Swal assay
Deaminase enzyme (i.e., wild-type met7, a catalytically inactive met7 mutant, or APOBEC3A (NEB; 10X dilution)) was pre-incubated at room temperature, 70°C, or 80°C. Next, 500 nM ssDNA substrate comprising a single unmethylated cytosine (C), methylated cytosine (mC), hydroxymethylcytosine (hmC), formylcytosine (fC), or carboxylcytosine (caC) was
incubated with ~10 pM of deaminase enzyme and reacted for 15 hours in IX Citrate Buffer, pH 6.0 (ThermoFisher Scientific, Catalog no. 005000) at 37°C, 70°C, or 80°C. Deamination was terminated via incubation at 95°C for 1 minute. Complementary oligonucleotide was then added at 3X molar excess and standard annealing was performed. 8 pL of the annealed mixture was incubated with Swal (NEB) in a Swal compatible buffer overnight at room temperature. The next day, Novex® TBE-Urea Sample Buffer (Catalog no. LC6876) was added at 1 : 1 v/v and incubated at 70°C for 3 minutes. Sample was loaded into Novex® 15% TBE-Urea Gel 1.0MM 15W (Cat no: EC68855BOX) and imaged using FAM filter with ChemiDoc MP imaging system (Bio-Rad). For some experiments, the ssDNA substrate was pre-annealed with 2X molar excess complementary oligonucleotide prior to incubation with the deaminase enzyme. This assay was adapted from Schutsky et al. (Nucleic Acid Research 45: 7655-7665, 2017).
Example 2:
The experiments in Example 1 demonstrate that a modified tRNA deaminase possesses ssDNA cytidine deaminase activity. In the following example, the inventors identify additional modifications to the tRNA deaminase that further increase this enzyme’s deaminase activity.
The crystal structure of CDAT8 (PDB: 3G8Q) was examined, and four loops that are predicted to lie near the bound substrate were identified (FIG. 6). Residues within these loops as well as others may be selected for mutagenesis. Exemplary mutations include, without limitation, those at P20, R22, T23, A40, D41, D42, E43, T63, A64, P66, G86, R87, G88, R89, and G90 of met7.
The deaminase activity of the met7 mutants was tested using the NGS assay described in Example 1. Met7 mutants were expressed as recombinant proteins in bacteria. A subset of the tested mutant proteins (e.g., A64 and G88) were purified using Ni-NTA prior to use in this assay. Crude bacterial lysate was utilized to test the remaining mutants. Multiple mutations in G88 and A64 were found to dramatically increase the deamination activity of met7 (FIG. 7). For example, G88F and G88I show near-complete deamination of the ssDNA oligo, while wild-type met7 deaminates only a minority of these sites. The activity of D41K, D41R, D42K, G90K, and G90W mutants were also significantly higher than wild-type met7 (FIG. 8). The activity of all tested amino acid substitution mutations is characterized in Table 2, below.
Table 2. Deaminase activity of met7 mutants as compared to wild-type met7
Claims
1. An engineered deaminase that binds to single- stranded DNA (ssDNA) and that deaminates ssDNA at a temperature that destabilizes ssDNA secondary structure, wherein the engineered deaminase comprises one or more modifications relative to a wild-type enzyme, and optionally wherein the wild-type enzyme is a tRNA deaminase.
2. An engineered deaminase with increased ssDNA deaminase activity relative to a wild-type enzyme, wherein the engineered deaminase comprising one or more modifications relative to the wild-type enzyme that provide increased deaminase activity, optionally wherein the wild-type enzyme is a tRNA deaminase.
3. The engineered deaminase of claim 1 or 2, wherein the engineered deaminase converts methylcytosine (mC) to thymine (T) in ssDNA.
4. The engineered deaminase of claim 3, wherein the engineered deaminase converts mC to T in ssDNA at a greater rate than it converts cytosine (C) to uridine (U) in ssDNA.
5. The engineered deaminase claim 3 or 4, wherein the engineered deaminase converts mC to T in ssDNA at a temperature between 50°C and 90°C.
6. The engineered deaminase of any one of claims 1-5, wherein the engineered deaminase converts C to U in ssDNA.
7. The engineered deaminase of claim 6, wherein the engineered deaminase converts C to U in ssDNA at a temperature between 50°C and 90°C.
8. The engineered deaminase of any one of claims 1-7, wherein the engineered deaminase converts hydroxymethylcytosine (hmC) to hydroxymethyluridine (hmU) in ssDNA.
9. The engineered deaminase of claim 8, wherein the engineered deaminase converts hmC to hmU in ssDNA at a temperature between 50°C and 90°C.
10. The engineered deaminase of any one of claims 1-9, wherein the wild-type enzyme is selected from the group consisting of SEQ ID NOs: 1 and 6-68.
11. The engineered deaminase of any one of claims 1-10, wherein the wild-type enzyme is a tRNA deaminase.
12. The engineered deaminase of claim 11, wherein the wild-type enzyme is CDAT8 from Methanopyrus kandleri and the engineered deaminase is a truncated form of the wild-type enzyme that comprises SEQ ID NO: 2.
13. The engineered deaminase of any one of claims 1-12, wherein engineered deaminase comprises an amino acid sequence with at least 70% sequence identity to SEQ ID NO: 2.
14. The engineered deaminase of any one of claims 1-13, wherein engineered deaminase consists essentially of an amino acid sequence with at least 70% sequence identity to SEQ ID NO: 2.
15. The engineered deaminase of any one of claims 1-14, wherein the one or more modifications comprise an amino acid substitution mutation in an active site-adjacent residue of a deaminase domain.
16. The engineered deaminase of claim 15, wherein the active site-adjacent residue is functionally equivalent to a residue selected from the group consisting of P20, R22, T23, A40, D41, D42, E43, T63, A64, P66, G86, R87, G88, R89, G90, and any combination thereof in SEQ ID NO: 2, optionally wherein the active site-adjacent residue is functionally equivalent to a residue selected from the group consisting of A64, G88, G90, D41, D42, T23, E43, and any combination thereof in SEQ ID NO: 2.
17. The engineered deaminase of claim 16, wherein the amino acid substitution mutation is selected from A64K, A64L, A64M, A64R, A64W, G88C, G88D, G88F, G88H, G88I, G88K, G88L, G88M, G88N, G88P, G88Q, G88R, G88S, G88V, G90W, G90K, D41K, D41R, D42K, D42R, T23K, E43K, and any combination thereof.
18. The engineered deaminase of any one of the preceding claims, wherein the one or more modifications comprise a supercharged surface modification.
19. The engineered deaminase of any one of the preceding claims, wherein the one or more modifications comprise fusion to a nucleic-acid binding domain or a guide nucleic acid.
20. A composition comprising the engineered deaminase of any one of the preceding claims and a buffer.
21. The composition of claim 20 further comprising DNA obtained from a sample.
22. The composition of claim 21, wherein the DNA is ssDNA.
23. The composition of claim 21 or 22, wherein the DNA comprises at least one modified cytosine (modC).
24. The composition of claim 23, wherein the DNA comprises at least one mC.
25. The composition of any one of claims 21-24, wherein the DNA is genomic DNA or cell free DNA.
26. The composition of claim 25, wherein the genomic DNA is from a single cell or a mixture from a plurality of cells.
27. A polynucleotide encoding the engineered deaminase of any one of claims 1-19.
28. A vector comprising the polynucleotide of claim 27.
29. A cell that expresses the engineered deaminase of any one of claims 1-19.
30. A method comprising contacting a sample comprising ssDNA suspected of having at least one modified cytosine (modC) with the engineered deaminase of any one of claims 1-19 under conditions suitable for conversion of the modC by deamination, optionally wherein the modified cysteine is 5mC or 5hmC.
31. The method of claim 30, wherein the sample is contacted with the engineered deaminase at a temperature that destabilizes ssDNA secondary structure.
32. The method of claim 30 or 31, wherein the sample is contacted with the engineered deaminase at a temperature between 50°C and 90°C.
33. The method of any one of claims 30-32 further comprising denaturing double-stranded DNA (dsDNA) in a sample to prepare ssDNA prior to contacting the ssDNA with the engineered deaminase to produce deaminated ssDNA.
34. The method of any one of claims 30-33 further comprising converting the deaminated ssDNA into a sequencing library.
35. The method of claim 34 further comprising: providing a surface comprising a plurality of amplification sites, wherein the amplification sites comprise at least two populations of attached single- stranded capture oligonucleotides having a free 3' end, and contacting the surface comprising amplification sites with the sequencing library under conditions suitable to produce a plurality of amplification sites that each comprise a clonal population of amplicons from an individual member of the sequencing library.
36. A method comprising processing a sample of DNA suspected of comprising dsDNA comprising at least one modC to produce a sequencing library, denaturing the sequencing library to result in ssDNA, contacting the ssDNA with the engineered deaminase of any one of claims 1- 19 under conditions suitable for conversion of the modC by deamination, and converting the converted ssDNA to a converted dsDNA sequencing library.
37. The method of claim 36, wherein the sample is contacted with the engineered deaminase at a temperature that destabilizes ssDNA secondary structure.
38. The method of claim 37, where the sample is contacted with the engineered deaminase at a temperature between 50°C and 90°C.
39. The method of any one of claims 36-38, wherein the sample comprises a biological sample.
40. The method of claim 39, wherein the biological sample comprises genomic DNA or cell-free DNA.
41. The method of claim 39 or 40, wherein the biological sample comprises a biological fluid.
42. The method of any one of claims 39-41, wherein the biological sample comprises cells or isolated nuclei.
43. The method of any one of claims 39-42, wherein the biological sample comprises a tissue.
44. The method of claim 43, wherein the tissue comprises tumor tissue.
45. The method of any one of claims 36-44, wherein the converting comprises fragmentation or tagmentation of the double-stranded DNA and addition of a universal sequence to the doublestranded DNA fragments.
46. The method of claim 45, wherein the universal sequence is part of an adapter added to the double-stranded DNA fragments.
47. The method of any one of claims 36-46, wherein the converting comprises amplifying the converted single-stranded DNA to be the converted double-stranded DNA.
48. The method of any one of claims 36-47 further comprising: providing a surface comprising a plurality of amplification sites, wherein the amplification sites comprise at least two populations of attached single-stranded capture oligonucleotides having a free 3' end; and contacting the surface comprising amplification sites with the converted double-stranded DNA sequencing library under conditions suitable to produce a plurality of amplification sites that each comprise a clonal population of amplicons from an individual member of the converted double-stranded DNA sequencing library.
49. A method of detecting the location of a modified cytosine (modC) in a target nucleic acid, the method comprising: contacting target nucleic acids suspected of comprising at least one modC with the engineered deaminase of any one of claims 1-19 under conditions suitable for conversion of modC to produce converted nucleic acid comprising at least one converted modC; detecting the at least one converted modC in the converted nucleic acid.
50. The method of claim 49, wherein the sample is contacted with the engineered deaminase at a temperate that destabilizes ssDNA secondary structure.
51. The method of claim 50, where the sample is contacted with the engineered deaminase at a temperature between 50°C and 90°C.
52. The method of any one of claims 49-51, wherein the detecting comprises identifying thymidine nucleotides in the converted nucleic acid to determine the location of 5mC nucleotides in the target nucleic acid.
53. The method of any one of claims 49-52, wherein the detecting comprises sequencing the converted nucleic acid or hybridizing one or more nucleic acid probes to the converted nucleic acid.
54. The method of claim 53, wherein the detecting comprises sequencing the converted nucleic acid and further comprises comparing the sequence of the converted nucleic acid with an untreated reference sequence to determine which cytosines in the target nucleic acid are modified.
55. The method of claim 54, wherein a predetermined sequence of the untreated reference sequence and a predetermined sequence of the converted nucleic acid are compared.
56. The method of any one of claims 49-55, further comprising preparing ssDNA from a sample comprising the target nucleic acid.
57. The method of claim 56, wherein the sample comprises a biological sample.
58. The method of claim 57, wherein the biological sample comprises genomic DNA or cell-free DNA.
59. The method of claim 57 or 58, wherein the biological sample comprises a biological fluid.
60. The method of any one of claims 57-59, wherein the biological sample comprises cells or isolated nuclei.
61. The method of any one of claims 57-60, wherein the biological sample comprises a tissue.
62. The method of claim 61, wherein the tissue comprises tumor tissue.
63. The method of any one of claims 49-62, wherein the target nucleic acids are obtained from a subject, wherein the detecting comprises obtaining a pattern of cytosine modification in the
converted nucleic acids, the method further comprising comparing the pattern of cytosine modification in the converted nucleic acids with the pattern of cytosine modification in a reference nucleic acid.
64. The method of claim 63, wherein the subject has or is at risk of having a disease or condition, and wherein the reference nucleic acid is from a normal subject.
65. The method of claim 64, wherein the pattern of cytosine modification is linked m-cis to a coding region that is correlated with the disease or condition.
66. The method of claim 65, wherein the comparing further comprises determining if the pattern of cytosine modification in the converted target nucleic acid indicates that the coding region is transcriptionally active or transcriptionally inactive in the subject.
67. The method of claim 66, wherein transcription of the coding region is correlated with the disease or condition.
68. The method of any one of claims 64-67, wherein the subject has the disease or condition and is undergoing treatment for the disease or condition, the method further comprising determining if the treatment is correlated with a change in the pattern of cytosine modification in the subject.
69. The method of any one of claims 64-67, wherein the subject previously had the disease or condition, the method further comprising comparing the pattern of cytosine modification in the subject with the pattern of cytosine modification in the subject when the subject had the disease or condition.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363589607P | 2023-10-11 | 2023-10-11 | |
US63/589,607 | 2023-10-11 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2025081064A2 true WO2025081064A2 (en) | 2025-04-17 |
WO2025081064A3 WO2025081064A3 (en) | 2025-05-15 |
Family
ID=93378261
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2024/051081 Pending WO2025081064A2 (en) | 2023-10-11 | 2024-10-11 | Thermophilic deaminase and methods for identifying modified cytosine |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2025081064A2 (en) |
Citations (85)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4683202A (en) | 1985-03-28 | 1987-07-28 | Cetus Corporation | Process for amplifying nucleic acid sequences |
US4683195A (en) | 1986-01-30 | 1987-07-28 | Cetus Corporation | Process for amplifying, detecting, and/or-cloning nucleic acid sequences |
WO1989009835A1 (en) | 1988-04-08 | 1989-10-19 | The Salk Institute For Biological Studies | Ligase-based amplification method |
WO1989012696A1 (en) | 1988-06-24 | 1989-12-28 | Amgen Inc. | Method and reagents for detecting nucleic acid sequences |
WO1990001069A1 (en) | 1988-07-20 | 1990-02-08 | Segev Diagnostics, Inc. | Process for amplifying and detecting nucleic acid sequences |
WO1991006678A1 (en) | 1989-10-26 | 1991-05-16 | Sri International | Dna sequencing |
US5130238A (en) | 1988-06-24 | 1992-07-14 | Cangene Corporation | Enhanced nucleic acid amplification process |
US5185243A (en) | 1988-08-25 | 1993-02-09 | Syntex (U.S.A.) Inc. | Method for detection of specific nucleic acid sequences |
US5223414A (en) | 1990-05-07 | 1993-06-29 | Sri International | Process for nucleic acid hybridization and amplification |
EP0320308B1 (en) | 1987-12-11 | 1993-11-03 | Abbott Laboratories | Method for detecting a target nucleic acid sequence |
EP0336731B1 (en) | 1988-04-06 | 1994-05-11 | City Of Hope | Method of amplifying and detecting nucleic acid sequences |
US5455166A (en) | 1991-01-31 | 1995-10-03 | Becton, Dickinson And Company | Strand displacement amplification |
EP0439182B1 (en) | 1990-01-26 | 1996-04-24 | Abbott Laboratories | Improved method of amplifying target nucleic acids applicable to both polymerase and ligase chain reactions |
US5573907A (en) | 1990-01-26 | 1996-11-12 | Abbott Laboratories | Detecting and amplifying target nucleic acids using exonucleolytic activity |
US5679524A (en) | 1994-02-07 | 1997-10-21 | Molecular Tool, Inc. | Ligase/polymerase mediated genetic bit analysis of single nucleotide polymorphisms and its use in genetic analysis |
WO1998044151A1 (en) | 1997-04-01 | 1998-10-08 | Glaxo Group Limited | Method of nucleic acid amplification |
WO2000018957A1 (en) | 1998-09-30 | 2000-04-06 | Applied Research Systems Ars Holding N.V. | Methods of nucleic acid amplification and sequencing |
WO2000063437A2 (en) | 1999-04-20 | 2000-10-26 | Illumina, Inc. | Detection of nucleic acid reactions on bead arrays |
US6172218B1 (en) | 1994-10-13 | 2001-01-09 | Lynx Therapeutics, Inc. | Oligonucleotide tags for sorting and identification |
US6210891B1 (en) | 1996-09-27 | 2001-04-03 | Pyrosequencing Ab | Method of sequencing DNA |
US6214587B1 (en) | 1994-03-16 | 2001-04-10 | Gen-Probe Incorporated | Isothermal strand displacement nucleic acid amplification |
US6258568B1 (en) | 1996-12-23 | 2001-07-10 | Pyrosequencing Ab | Method of sequencing DNA based on the detection of the release of pyrophosphate and enzymatic nucleotide degradation |
US6266459B1 (en) | 1997-03-14 | 2001-07-24 | Trustees Of Tufts College | Fiber optic sensor with encoded microspheres |
US6274320B1 (en) | 1999-09-16 | 2001-08-14 | Curagen Corporation | Method of sequencing a nucleic acid |
US6306597B1 (en) | 1995-04-17 | 2001-10-23 | Lynx Therapeutics, Inc. | DNA sequencing by parallel oligonucleotide extensions |
US6355431B1 (en) | 1999-04-20 | 2002-03-12 | Illumina, Inc. | Detection of nucleic acid amplification reactions using bead arrays |
WO2002046456A1 (en) | 2000-12-08 | 2002-06-13 | Applied Research Systems Ars Holding N.V. | Isothermal amplification of nucleic acids on a solid support |
WO2004018497A2 (en) | 2002-08-23 | 2004-03-04 | Solexa Limited | Modified nucleotides for polynucleotide sequencing |
US6770441B2 (en) | 2000-02-10 | 2004-08-03 | Illumina, Inc. | Array compositions and methods of making same |
US6859570B2 (en) | 1997-03-14 | 2005-02-22 | Trustees Of Tufts College, Tufts University | Target analyte sensors utilizing microspheres |
WO2005065814A1 (en) | 2004-01-07 | 2005-07-21 | Solexa Limited | Modified molecular arrays |
WO2005068656A1 (en) | 2004-01-12 | 2005-07-28 | Solexa Limited | Nucleic acid characterisation |
US6969488B2 (en) | 1998-05-22 | 2005-11-29 | Solexa, Inc. | System and apparatus for sequential processing of analytes |
US7001792B2 (en) | 2000-04-24 | 2006-02-21 | Eagle Research & Development, Llc | Ultra-fast nucleic acid sequencing device and a method for making and using the same |
US7057026B2 (en) | 2001-12-04 | 2006-06-06 | Solexa Limited | Labelled nucleotides |
WO2006064199A1 (en) | 2004-12-13 | 2006-06-22 | Solexa Limited | Improved method of nucleotide detection |
US20060240439A1 (en) | 2003-09-11 | 2006-10-26 | Smith Geoffrey P | Modified polymerases for improved incorporation of nucleotide analogues |
US20060281109A1 (en) | 2005-05-10 | 2006-12-14 | Barr Ost Tobias W | Polymerases |
WO2007010251A2 (en) | 2005-07-20 | 2007-01-25 | Solexa Limited | Preparation of templates for nucleic acid sequencing |
US7211414B2 (en) | 2000-12-01 | 2007-05-01 | Visigen Biotechnologies, Inc. | Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity |
US20070099208A1 (en) | 2005-06-15 | 2007-05-03 | Radoje Drmanac | Single molecule arrays for genetic and chemical analysis |
WO2007123744A2 (en) | 2006-03-31 | 2007-11-01 | Solexa, Inc. | Systems and devices for sequence by synthesis analysis |
US7315019B2 (en) | 2004-09-17 | 2008-01-01 | Pacific Biosciences Of California, Inc. | Arrays of optical confinements and uses thereof |
US20080009420A1 (en) | 2006-03-17 | 2008-01-10 | Schroth Gary P | Isothermal methods for creating clonal single molecule arrays |
US7329492B2 (en) | 2000-07-07 | 2008-02-12 | Visigen Biotechnologies, Inc. | Methods for real-time single molecule sequence determination |
US20080108082A1 (en) | 2006-10-23 | 2008-05-08 | Pacific Biosciences Of California, Inc. | Polymerase enzymes and reagents for enhanced nucleic acid sequencing |
US7399590B2 (en) | 2002-02-21 | 2008-07-15 | Asm Scientific, Inc. | Recombinase polymerase amplification |
US7405281B2 (en) | 2005-09-29 | 2008-07-29 | Pacific Biosciences Of California, Inc. | Fluorescent nucleotide analogs and uses therefor |
US20090005252A1 (en) | 2006-02-24 | 2009-01-01 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090011943A1 (en) | 2005-06-15 | 2009-01-08 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090026082A1 (en) | 2006-12-14 | 2009-01-29 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
US20090127589A1 (en) | 2006-12-14 | 2009-05-21 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
US7582420B2 (en) | 2001-07-12 | 2009-09-01 | Illumina, Inc. | Multiplex nucleic acid reactions |
US7611869B2 (en) | 2000-02-07 | 2009-11-03 | Illumina, Inc. | Multiplexed methylation detection methods |
US7622294B2 (en) | 1997-03-14 | 2009-11-24 | Trustees Of Tufts College | Methods for detecting target analytes and enzymatic reactions |
US7670810B2 (en) | 2003-06-20 | 2010-03-02 | Illumina, Inc. | Methods and compositions for whole genome amplification and genotyping |
US20100120098A1 (en) | 2008-10-24 | 2010-05-13 | Epicentre Technologies Corporation | Transposon end compositions and methods for modifying nucleic acids |
US20100137143A1 (en) | 2008-10-22 | 2010-06-03 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes |
US7741463B2 (en) | 2005-11-01 | 2010-06-22 | Illumina Cambridge Limited | Method of preparing libraries of template polynucleotides |
US7829284B2 (en) | 2002-09-20 | 2010-11-09 | New England Biolabs, Inc. | Helicase-dependent amplification of nucleic acids |
US20100282617A1 (en) | 2006-12-14 | 2010-11-11 | Ion Torrent Systems Incorporated | Methods and apparatus for detecting molecular interactions using fet arrays |
US7910354B2 (en) | 2006-10-27 | 2011-03-22 | Complete Genomics, Inc. | Efficient arrays of amplified polynucleotides |
US8003354B2 (en) | 2000-02-07 | 2011-08-23 | Illumina, Inc. | Multiplex nucleic acid reactions |
US8053192B2 (en) | 2007-02-02 | 2011-11-08 | Illumina Cambridge Ltd. | Methods for indexing samples and sequencing multiple polynucleotide templates |
WO2012061832A1 (en) | 2010-11-05 | 2012-05-10 | Illumina, Inc. | Linking sequence reads using paired code tags |
US20120208705A1 (en) | 2011-02-10 | 2012-08-16 | Steemers Frank J | Linking sequence reads using paired code tags |
US20120208724A1 (en) | 2011-02-10 | 2012-08-16 | Steemers Frank J | Linking sequence reads using paired code tags |
US20120270305A1 (en) | 2011-01-10 | 2012-10-25 | Illumina Inc. | Systems, methods, and apparatuses to image a sample for biological or chemical analysis |
US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
US20130184796A1 (en) | 2012-01-16 | 2013-07-18 | Greatbatch Ltd. | Elevated Hermetic Feedthrough Insulator Adapted for Side Attachment of Electrical Conductors on the Body Fluid Side of an Active Implantable Medical Device |
US20130260372A1 (en) | 2012-04-03 | 2013-10-03 | Illumina, Inc. | Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing |
US20130274117A1 (en) | 2010-10-08 | 2013-10-17 | President And Fellows Of Harvard College | High-Throughput Single Cell Barcoding |
US20130338042A1 (en) | 2012-06-15 | 2013-12-19 | Illumina, Inc. | Kinetic exclusion amplification of nucleic acid libraries |
US8778849B2 (en) | 2011-10-28 | 2014-07-15 | Illumina, Inc. | Microarray fabrication system and method |
US8778848B2 (en) | 2011-06-09 | 2014-07-15 | Illumina, Inc. | Patterned flow-cells useful for nucleic acid analysis |
US20140243224A1 (en) | 2013-02-26 | 2014-08-28 | Illumina, Inc. | Gel patterned surfaces |
WO2015002813A1 (en) | 2013-07-01 | 2015-01-08 | Illumina, Inc. | Catalyst-free surface functionalization and polymer grafting |
US9079148B2 (en) | 2008-07-02 | 2015-07-14 | Illumina Cambridge Limited | Using populations of beads for the fabrication of arrays on surfaces |
WO2015106941A1 (en) | 2014-01-16 | 2015-07-23 | Illumina Cambridge Limited | Polynucleotide modification on solid support |
US9309502B2 (en) | 2002-02-21 | 2016-04-12 | Alere San Diego Inc. | Recombinase polymerase amplification |
WO2016066586A1 (en) | 2014-10-31 | 2016-05-06 | Illumina Cambridge Limited | Novel polymers and dna copolymer coatings |
WO2016130704A2 (en) | 2015-02-10 | 2016-08-18 | Illumina, Inc. | Methods and compositions for analyzing cellular components |
WO2018018008A1 (en) | 2016-07-22 | 2018-01-25 | Oregon Health & Science University | Single cell whole genome libraries and combinatorial indexing methods of making thereof |
US20180305753A1 (en) | 2017-04-23 | 2018-10-25 | Illumina Cambridge Limited | Compositions and methods for improving sample identification in indexed nucleic acid libraries |
WO2019236599A2 (en) | 2018-06-04 | 2019-12-12 | Illumina, Inc. | High-throughput single-cell transcriptome libraries and methods of making and of using |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW202237845A (en) * | 2020-12-11 | 2022-10-01 | 美商英特利亞醫療公司 | Polynucleotides, compositions, and methods for genome editing involving deamination |
KR20240114296A (en) * | 2021-11-03 | 2024-07-23 | 인텔리아 테라퓨틱스, 인크. | Polynucleotides, compositions, and methods for genome editing |
WO2023081855A1 (en) * | 2021-11-05 | 2023-05-11 | Metagenomi, Inc. | Base editing enzymes |
-
2024
- 2024-10-11 WO PCT/US2024/051081 patent/WO2025081064A2/en active Pending
Patent Citations (101)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4683202A (en) | 1985-03-28 | 1987-07-28 | Cetus Corporation | Process for amplifying nucleic acid sequences |
US4683202B1 (en) | 1985-03-28 | 1990-11-27 | Cetus Corp | |
US4683195B1 (en) | 1986-01-30 | 1990-11-27 | Cetus Corp | |
US4683195A (en) | 1986-01-30 | 1987-07-28 | Cetus Corporation | Process for amplifying, detecting, and/or-cloning nucleic acid sequences |
EP0320308B1 (en) | 1987-12-11 | 1993-11-03 | Abbott Laboratories | Method for detecting a target nucleic acid sequence |
EP0336731B1 (en) | 1988-04-06 | 1994-05-11 | City Of Hope | Method of amplifying and detecting nucleic acid sequences |
WO1989009835A1 (en) | 1988-04-08 | 1989-10-19 | The Salk Institute For Biological Studies | Ligase-based amplification method |
WO1989012696A1 (en) | 1988-06-24 | 1989-12-28 | Amgen Inc. | Method and reagents for detecting nucleic acid sequences |
US5130238A (en) | 1988-06-24 | 1992-07-14 | Cangene Corporation | Enhanced nucleic acid amplification process |
WO1990001069A1 (en) | 1988-07-20 | 1990-02-08 | Segev Diagnostics, Inc. | Process for amplifying and detecting nucleic acid sequences |
US5185243A (en) | 1988-08-25 | 1993-02-09 | Syntex (U.S.A.) Inc. | Method for detection of specific nucleic acid sequences |
WO1991006678A1 (en) | 1989-10-26 | 1991-05-16 | Sri International | Dna sequencing |
EP0439182B1 (en) | 1990-01-26 | 1996-04-24 | Abbott Laboratories | Improved method of amplifying target nucleic acids applicable to both polymerase and ligase chain reactions |
US5573907A (en) | 1990-01-26 | 1996-11-12 | Abbott Laboratories | Detecting and amplifying target nucleic acids using exonucleolytic activity |
US5223414A (en) | 1990-05-07 | 1993-06-29 | Sri International | Process for nucleic acid hybridization and amplification |
US5455166A (en) | 1991-01-31 | 1995-10-03 | Becton, Dickinson And Company | Strand displacement amplification |
US5679524A (en) | 1994-02-07 | 1997-10-21 | Molecular Tool, Inc. | Ligase/polymerase mediated genetic bit analysis of single nucleotide polymorphisms and its use in genetic analysis |
US6214587B1 (en) | 1994-03-16 | 2001-04-10 | Gen-Probe Incorporated | Isothermal strand displacement nucleic acid amplification |
US6172218B1 (en) | 1994-10-13 | 2001-01-09 | Lynx Therapeutics, Inc. | Oligonucleotide tags for sorting and identification |
US6306597B1 (en) | 1995-04-17 | 2001-10-23 | Lynx Therapeutics, Inc. | DNA sequencing by parallel oligonucleotide extensions |
US6210891B1 (en) | 1996-09-27 | 2001-04-03 | Pyrosequencing Ab | Method of sequencing DNA |
US6258568B1 (en) | 1996-12-23 | 2001-07-10 | Pyrosequencing Ab | Method of sequencing DNA based on the detection of the release of pyrophosphate and enzymatic nucleotide degradation |
US6859570B2 (en) | 1997-03-14 | 2005-02-22 | Trustees Of Tufts College, Tufts University | Target analyte sensors utilizing microspheres |
US7622294B2 (en) | 1997-03-14 | 2009-11-24 | Trustees Of Tufts College | Methods for detecting target analytes and enzymatic reactions |
US6266459B1 (en) | 1997-03-14 | 2001-07-24 | Trustees Of Tufts College | Fiber optic sensor with encoded microspheres |
US20050100900A1 (en) | 1997-04-01 | 2005-05-12 | Manteia Sa | Method of nucleic acid amplification |
WO1998044151A1 (en) | 1997-04-01 | 1998-10-08 | Glaxo Group Limited | Method of nucleic acid amplification |
US6969488B2 (en) | 1998-05-22 | 2005-11-29 | Solexa, Inc. | System and apparatus for sequential processing of analytes |
WO2000018957A1 (en) | 1998-09-30 | 2000-04-06 | Applied Research Systems Ars Holding N.V. | Methods of nucleic acid amplification and sequencing |
US7115400B1 (en) | 1998-09-30 | 2006-10-03 | Solexa Ltd. | Methods of nucleic acid amplification and sequencing |
US6355431B1 (en) | 1999-04-20 | 2002-03-12 | Illumina, Inc. | Detection of nucleic acid amplification reactions using bead arrays |
WO2000063437A2 (en) | 1999-04-20 | 2000-10-26 | Illumina, Inc. | Detection of nucleic acid reactions on bead arrays |
US6274320B1 (en) | 1999-09-16 | 2001-08-14 | Curagen Corporation | Method of sequencing a nucleic acid |
US7611869B2 (en) | 2000-02-07 | 2009-11-03 | Illumina, Inc. | Multiplexed methylation detection methods |
US8003354B2 (en) | 2000-02-07 | 2011-08-23 | Illumina, Inc. | Multiplex nucleic acid reactions |
US6770441B2 (en) | 2000-02-10 | 2004-08-03 | Illumina, Inc. | Array compositions and methods of making same |
US7001792B2 (en) | 2000-04-24 | 2006-02-21 | Eagle Research & Development, Llc | Ultra-fast nucleic acid sequencing device and a method for making and using the same |
US7329492B2 (en) | 2000-07-07 | 2008-02-12 | Visigen Biotechnologies, Inc. | Methods for real-time single molecule sequence determination |
US7211414B2 (en) | 2000-12-01 | 2007-05-01 | Visigen Biotechnologies, Inc. | Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity |
WO2002046456A1 (en) | 2000-12-08 | 2002-06-13 | Applied Research Systems Ars Holding N.V. | Isothermal amplification of nucleic acids on a solid support |
US7582420B2 (en) | 2001-07-12 | 2009-09-01 | Illumina, Inc. | Multiplex nucleic acid reactions |
US7057026B2 (en) | 2001-12-04 | 2006-06-06 | Solexa Limited | Labelled nucleotides |
US7427673B2 (en) | 2001-12-04 | 2008-09-23 | Illumina Cambridge Limited | Labelled nucleotides |
US20060188901A1 (en) | 2001-12-04 | 2006-08-24 | Solexa Limited | Labelled nucleotides |
US7399590B2 (en) | 2002-02-21 | 2008-07-15 | Asm Scientific, Inc. | Recombinase polymerase amplification |
US9309502B2 (en) | 2002-02-21 | 2016-04-12 | Alere San Diego Inc. | Recombinase polymerase amplification |
US20070166705A1 (en) | 2002-08-23 | 2007-07-19 | John Milton | Modified nucleotides |
WO2004018497A2 (en) | 2002-08-23 | 2004-03-04 | Solexa Limited | Modified nucleotides for polynucleotide sequencing |
US7829284B2 (en) | 2002-09-20 | 2010-11-09 | New England Biolabs, Inc. | Helicase-dependent amplification of nucleic acids |
US7670810B2 (en) | 2003-06-20 | 2010-03-02 | Illumina, Inc. | Methods and compositions for whole genome amplification and genotyping |
US20060240439A1 (en) | 2003-09-11 | 2006-10-26 | Smith Geoffrey P | Modified polymerases for improved incorporation of nucleotide analogues |
US20110059865A1 (en) | 2004-01-07 | 2011-03-10 | Mark Edward Brennan Smith | Modified Molecular Arrays |
US8563477B2 (en) | 2004-01-07 | 2013-10-22 | Illumina Cambridge Limited | Modified molecular arrays |
WO2005065814A1 (en) | 2004-01-07 | 2005-07-21 | Solexa Limited | Modified molecular arrays |
WO2005068656A1 (en) | 2004-01-12 | 2005-07-28 | Solexa Limited | Nucleic acid characterisation |
US7315019B2 (en) | 2004-09-17 | 2008-01-01 | Pacific Biosciences Of California, Inc. | Arrays of optical confinements and uses thereof |
WO2006064199A1 (en) | 2004-12-13 | 2006-06-22 | Solexa Limited | Improved method of nucleotide detection |
US20060281109A1 (en) | 2005-05-10 | 2006-12-14 | Barr Ost Tobias W | Polymerases |
US20090011943A1 (en) | 2005-06-15 | 2009-01-08 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20070099208A1 (en) | 2005-06-15 | 2007-05-03 | Radoje Drmanac | Single molecule arrays for genetic and chemical analysis |
WO2007010251A2 (en) | 2005-07-20 | 2007-01-25 | Solexa Limited | Preparation of templates for nucleic acid sequencing |
US7405281B2 (en) | 2005-09-29 | 2008-07-29 | Pacific Biosciences Of California, Inc. | Fluorescent nucleotide analogs and uses therefor |
US7741463B2 (en) | 2005-11-01 | 2010-06-22 | Illumina Cambridge Limited | Method of preparing libraries of template polynucleotides |
US20090005252A1 (en) | 2006-02-24 | 2009-01-01 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090118488A1 (en) | 2006-02-24 | 2009-05-07 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090155781A1 (en) | 2006-02-24 | 2009-06-18 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20090264299A1 (en) | 2006-02-24 | 2009-10-22 | Complete Genomics, Inc. | High throughput genome sequencing on DNA arrays |
US20080009420A1 (en) | 2006-03-17 | 2008-01-10 | Schroth Gary P | Isothermal methods for creating clonal single molecule arrays |
US8241573B2 (en) | 2006-03-31 | 2012-08-14 | Illumina, Inc. | Systems and devices for sequence by synthesis analysis |
WO2007123744A2 (en) | 2006-03-31 | 2007-11-01 | Solexa, Inc. | Systems and devices for sequence by synthesis analysis |
US20080108082A1 (en) | 2006-10-23 | 2008-05-08 | Pacific Biosciences Of California, Inc. | Polymerase enzymes and reagents for enhanced nucleic acid sequencing |
US7910354B2 (en) | 2006-10-27 | 2011-03-22 | Complete Genomics, Inc. | Efficient arrays of amplified polynucleotides |
US20090026082A1 (en) | 2006-12-14 | 2009-01-29 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
US20100282617A1 (en) | 2006-12-14 | 2010-11-11 | Ion Torrent Systems Incorporated | Methods and apparatus for detecting molecular interactions using fet arrays |
US20090127589A1 (en) | 2006-12-14 | 2009-05-21 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
US8053192B2 (en) | 2007-02-02 | 2011-11-08 | Illumina Cambridge Ltd. | Methods for indexing samples and sequencing multiple polynucleotide templates |
US9079148B2 (en) | 2008-07-02 | 2015-07-14 | Illumina Cambridge Limited | Using populations of beads for the fabrication of arrays on surfaces |
US20100137143A1 (en) | 2008-10-22 | 2010-06-03 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes |
US20100120098A1 (en) | 2008-10-24 | 2010-05-13 | Epicentre Technologies Corporation | Transposon end compositions and methods for modifying nucleic acids |
US20130274117A1 (en) | 2010-10-08 | 2013-10-17 | President And Fellows Of Harvard College | High-Throughput Single Cell Barcoding |
WO2012061832A1 (en) | 2010-11-05 | 2012-05-10 | Illumina, Inc. | Linking sequence reads using paired code tags |
US20120270305A1 (en) | 2011-01-10 | 2012-10-25 | Illumina Inc. | Systems, methods, and apparatuses to image a sample for biological or chemical analysis |
US8951781B2 (en) | 2011-01-10 | 2015-02-10 | Illumina, Inc. | Systems, methods, and apparatuses to image a sample for biological or chemical analysis |
US20120208724A1 (en) | 2011-02-10 | 2012-08-16 | Steemers Frank J | Linking sequence reads using paired code tags |
US20120208705A1 (en) | 2011-02-10 | 2012-08-16 | Steemers Frank J | Linking sequence reads using paired code tags |
US8778848B2 (en) | 2011-06-09 | 2014-07-15 | Illumina, Inc. | Patterned flow-cells useful for nucleic acid analysis |
US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
US8778849B2 (en) | 2011-10-28 | 2014-07-15 | Illumina, Inc. | Microarray fabrication system and method |
US20130184796A1 (en) | 2012-01-16 | 2013-07-18 | Greatbatch Ltd. | Elevated Hermetic Feedthrough Insulator Adapted for Side Attachment of Electrical Conductors on the Body Fluid Side of an Active Implantable Medical Device |
US20130260372A1 (en) | 2012-04-03 | 2013-10-03 | Illumina, Inc. | Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing |
US20130338042A1 (en) | 2012-06-15 | 2013-12-19 | Illumina, Inc. | Kinetic exclusion amplification of nucleic acid libraries |
US8895249B2 (en) | 2012-06-15 | 2014-11-25 | Illumina, Inc. | Kinetic exclusion amplification of nucleic acid libraries |
US9169513B2 (en) | 2012-06-15 | 2015-10-27 | Illumina, Inc. | Kinetic exclusion amplification of nucleic acid libraries |
US20140243224A1 (en) | 2013-02-26 | 2014-08-28 | Illumina, Inc. | Gel patterned surfaces |
WO2015002813A1 (en) | 2013-07-01 | 2015-01-08 | Illumina, Inc. | Catalyst-free surface functionalization and polymer grafting |
WO2015106941A1 (en) | 2014-01-16 | 2015-07-23 | Illumina Cambridge Limited | Polynucleotide modification on solid support |
WO2016066586A1 (en) | 2014-10-31 | 2016-05-06 | Illumina Cambridge Limited | Novel polymers and dna copolymer coatings |
WO2016130704A2 (en) | 2015-02-10 | 2016-08-18 | Illumina, Inc. | Methods and compositions for analyzing cellular components |
WO2018018008A1 (en) | 2016-07-22 | 2018-01-25 | Oregon Health & Science University | Single cell whole genome libraries and combinatorial indexing methods of making thereof |
US20180305753A1 (en) | 2017-04-23 | 2018-10-25 | Illumina Cambridge Limited | Compositions and methods for improving sample identification in indexed nucleic acid libraries |
WO2019236599A2 (en) | 2018-06-04 | 2019-12-12 | Illumina, Inc. | High-throughput single-cell transcriptome libraries and methods of making and of using |
Non-Patent Citations (47)
Title |
---|
"Methods in Molecular Biology", vol. 192, HUMANA PRESS, article "PCR Cloning Protocols, Second Edition" |
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410 |
BENTLEY ET AL., NATURE, vol. 456, 2008, pages 53 - 59 |
BRANSTEITTER ET AL., PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 100, no. 7, 2003, pages 4102 - 7 |
COCKROFT, S. L.CHU, J.AMORIN, MGHADIRI, M. R: "A single-molecule nanopore device detects DNA polymerase activity with single-nucleotide resolution.", J. AM. CHEM. SOC., vol. 130, 2008, pages 818 - 820, XP055097434, DOI: 10.1021/ja077082c |
DEAMER, D. WAKESON: "Nanopores and nucleic acids: prospects for ultrarapid sequencing.", TRENDS BIOTECHNOL, vol. 18, 2000, pages 147 - 151, XP004194002, DOI: 10.1016/S0167-7799(00)01426-8 |
DEAMER, DD. BRANTON: "Characterization of nucleic acids by nanopore analysis", ACC. CHEM. RES, vol. 35, 2002, pages 817 - 825, XP002226144, DOI: 10.1021/ar000138m |
DEAN ET AL., PROC. NATL. ACAD. SCI. USA, vol. 99, 2002, pages 5261 - 66 |
DRMANAC ET AL., SCIENCE, vol. 327, no. 5961, 2010, pages 78 - 81 |
FRESHNEY: "Current Protocols in Molecular Biology", 1994, WILEY-LISS |
FULLGRABE ET AL.: "Simultaneous sequencing of genetic and epigenetic bases in DNA", NAT BIOTECHNOL, 2023 |
GAPBESTFITFASTATFASTA, WISCONSIN GENETICS SOFTWARE PACKAGE, GENETICS COMPUTER GROUP |
GORYSHINREZNIKOFF, J. BIOL. CHEM., vol. 273, 1998, pages 7367 |
GREAGG ET AL., PNAS USA, vol. 96, no. 16, 1999, pages 9045 - 50 |
HARRISANGAL: "Protein Purification Applications: A Practical Approach", vol. 182, 1990, ACADEMIC PRESS, INC. N.Y |
HEALY, K: "Nanopore-based single-molecule DNA analysis.", NANOMED, vol. 2, 2007, pages 459 - 481, XP009111262, DOI: 10.2217/17435889.2.4.459 |
KORLACH, J ET AL.: "Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nano structures.", PROC. NATL. ACAD. SCI. USA, vol. 105, 2008, pages 1176 - 1181 |
LAGE ET AL., GENOME RES, vol. 13, 2003, pages 294 - 307 |
LEVENE, M. J ET AL.: "Zero-mode waveguides for single-molecule analysis at high concentrations.", SCIENCE, vol. 299, 2003, pages 682 - 686, XP002341055, DOI: 10.1126/science.1079700 |
LI, J.M. GERSHOWD. STEINE. BRANDINJ. A. GOLOVCHENKO: "DNA molecules and configurations in a solid-state nanopore microscope", NAT. MATER, vol. 2, 2003, pages 611 - 615, XP009039572, DOI: 10.1038/nmat965 |
LIZARDI ET AL., NAT. GENET, vol. 19, 1998, pages 225 - 232 |
LUNDQUIST, P. M ET AL.: "Parallel confocal detection of single molecules in real time.", OPT. LETT, vol. 33, 2008, pages 1026 - 1028, XP001522593, DOI: 10.1364/OL.33.001026 |
METZKER, GENOME RES, vol. 15, 2005, pages 1767 - 1776 |
MIZUUCHI, K., CELL, vol. 35, 1983, pages 785 |
NEEDLEMANWUNSCH, J. MOL. BIOL, vol. 48, 1970, pages 443 |
PEARSONLIPMAN, PROC. NAT'L. ACAD. SCI. USA, vol. 85, 1988, pages 2444 |
R. SCOPES: "Protein Purification", 1982, SPRINGER-VERLAG |
RANDAU ET AL.: "A cytidine deaminase edits C to U in transfer RNAs in Archaea", SCIENCE, vol. 324, no. 5927, 2009, pages 657 - 659 |
RONAGHI, M.UHLEN, MNYREN, P: "A sequencing method based on real-time pyrophosphate.", SCIENCE, vol. 281, no. 5375, 1998, pages 363, XP002135869, DOI: 10.1126/science.281.5375.363 |
RONAGHI, M: "Pyrosequencing sheds light on DNA sequencing.", GENOME RES, vol. 11, no. 1, 2001, pages 3 - 11, XP000980886, DOI: 10.1101/gr.11.1.3 |
RONAGHI, MKARAMOHAMED, S.PETTERSSON, B.UHLEN, MNYREN, P: "Real-time DNA sequencing using detection of pyrophosphate release.", ANALYTICAL BIOCHEMISTRY, vol. 242, no. 1, 1996, pages 84 - 9, XP002388725, DOI: 10.1006/abio.1996.0432 |
RUPAREL ET AL., PROC NATL ACAD SCI USA, vol. 102, 2005, pages 5932 - 7 |
SAM ET AL., PLOS ONE, vol. 13, no. 6, 2018 |
SAMBROOK ET AL.: "Molecular cloning: A Laboratory Manual", COLD SPRING HARBOR LABORATORY, 1989 |
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual.", 1989, COLD SPRING HARBOR LABORATORY PRESS |
SANDANA: "Bioseparation of Proteins", 1997, ACADEMIC PRESS, INC. |
SAVILAHTI, H ET AL., EMBO J., vol. 14, 1995, pages 4893 |
SCHUTSKY ET AL., NATURE BIOTECHNOLOGY |
SCHUTSKY ET AL., NUCLEIC ACID RESEARCH, vol. 45, 2017, pages 7655 - 7665 |
SCOPES: "Protein Purification: Principles and Practice 3rd Edition", 1993, SPRINGER VERLAG, NY |
SMITHWATERMAN, ADV. APPL. MATH, vol. 2, 1981, pages 482 |
SONI, G. VMELLER: "A. Progress toward ultrafast DNA sequencing using solid-state nanopores.", CLIN. CHEM., vol. 53, 2007, pages 1996 - 2001, XP055076185, DOI: 10.1373/clinchem.2007.091231 |
SRINIVASAN ET AL., AM J PATHOL, vol. 161, no. 6, December 2002 (2002-12-01), pages 1961 - 1971 |
TROLL ET AL., BMC GENOMICS, vol. 20, 2019, pages 1023 |
WALKER ET AL., NUCL. ACIDS RES, vol. 20, 1992, pages 1691 - 96 |
WALKER ET AL.: "Molecular Methods for Virus Detection", 1995, ACADEMIC PRESS, INC |
WANG ET AL.: "Direct enzymatic sequencing of 5-methylcytosine at single-base resolution", NAT CHEM BIOL, vol. 19, 2023, pages 1004 - 1012, XP093149495, DOI: 10.1038/s41589-023-01318-1 |
Also Published As
Publication number | Publication date |
---|---|
WO2025081064A3 (en) | 2025-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240182881A1 (en) | Altered cytidine deaminases and methods of use | |
AU2022202505A1 (en) | Compositions And Methods For Improving Sample Identification In Indexed Nucleic Acid Libraries | |
JP7587425B2 (en) | Methods and compositions for paired-end sequencing using a single surface primer | |
KR20190034164A (en) | Single cell whole genomic libraries and combinatorial indexing methods for their production | |
KR20220162873A (en) | Contiguity preserving transposition | |
JP7569690B2 (en) | Methods and compositions for generating clusters by bridge amplification - Patents.com | |
US20210380972A1 (en) | Methods for increasing yield of sequencing libraries | |
JP7662340B2 (en) | Methods for improving the clonality of polynucleotide clusters | |
EP4594482A1 (en) | Cytidine deaminases and methods of use in mapping modified cytosine nucleotides | |
WO2021252617A1 (en) | Methods for increasing yield of sequencing libraries | |
WO2024069581A1 (en) | Helicase-cytidine deaminase complexes and methods of use | |
WO2025081064A2 (en) | Thermophilic deaminase and methods for identifying modified cytosine | |
WO2024073043A1 (en) | Methods of using cpg binding proteins in mapping modified cytosine nucleotides | |
WO2025072783A1 (en) | Altered cytidine deaminases and methods of use | |
WO2025137222A1 (en) | Methylation detection assay | |
HK40081745A (en) | Methods for increasing yield of sequencing libraries | |
WO2024249466A1 (en) | False positive reduction by translesion polymerase repair | |
WO2024249200A1 (en) | Methods for preserving methylation status during clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24801377 Country of ref document: EP Kind code of ref document: A2 |