WO2009098037A1 - Séquençage d'extrémités par paire - Google Patents
Séquençage d'extrémités par paire Download PDFInfo
- Publication number
- WO2009098037A1 WO2009098037A1 PCT/EP2009/000741 EP2009000741W WO2009098037A1 WO 2009098037 A1 WO2009098037 A1 WO 2009098037A1 EP 2009000741 W EP2009000741 W EP 2009000741W WO 2009098037 A1 WO2009098037 A1 WO 2009098037A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- dna
- adaptor
- target nucleic
- sequencing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1093—General methods of preparing gene libraries, not provided for in other subgroups
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6853—Nucleic acid amplification reactions using modified primers or templates
- C12Q1/6855—Ligating adaptors
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
Definitions
- the present invention is related to the field of nucleic acid sequencing, genomic sequencing, and the assembly of the sequencing results into a contiguous sequence.
- One approach to sequencing a large target nucleic acid is the use of shotgun sequencing.
- shotgun sequencing the target nucleic acid is fragmented or subcloned to produce a series of overlapping nucleic acid fragments and determining the sequence of these fragments. Based on the overlap and the knowledge of the sequence of each fragment, the complete sequence of a target nucleic acid can be constructed.
- One disadvantage of the shotgun approach to sequencing is that assembly may be difficult if the target nucleic acid sequence comprise numerous small repeats (tandem or inverted repeats). The inability to assemble a genomic sequence in repeat regions leads to gaps in the assembled sequence. Thus, following initial- assembly of a nucleic acid sequence, gaps in sequence coverage would need to be filled and uncertainties in assembly would need to be resolved.
- One method of resolving these gaps is to use larger clones or fragments for sequencing because these larger fragments would be long enough to span the repeat regions.
- the sequencing of large fragments of nucleic acid is more difficult and time consuming in current sequencing apparatus.
- Another approach to spanning a gap in the sequence is to determine the sequence of both ends of a large fragment.
- a pair of sequence reads from both ends have known spacing and orientation.
- the use of relatively long fragments also aids in the assembly of sequences containing interspersed repetitive elements.
- This type of approach Smith, M.W. et al., Nature Genetics 7: 40-47 (1994) is known in the art as paired end sequencing.
- the present invention includes novel methods, systems and compositions useful for paired-end sequencing approaches and other nucleic acid technologies.
- One embodiment of the invention is directed to a method for obtaining a DNA construct comprising two end regions of a target nucleic acid in an in vitro reaction which can be a large segment from the genome of an organism.
- the method comprises the following steps:
- An embodiment of a method for obtaining a DNA construct comprising two end regions of a target nucleic acid in an in vitro reaction comprises the steps of: fragmenting a large nucleic acid molecule to produce a target nucleic acid molecule; ligating a recombination adaptor element to each end of the target nucleic acid molecule to produce an adapted target nucleic acid molecule; exposing the adapted target nucleic acid to a site specific recombinase to produce a circular nucleic acid product and a linear nucleic acid product from the adapted target nucleic acid, wherein the circular nucleic acid product comprises the target nucleic acid molecule; and fragmenting the circular nucleic acid product to produce a template nucleic acid molecule comprising a sequence region from each end of the target nucleic acid molecule.
- the method further comprises the step removing the non- circular molecules using an exonulcease.
- methd further comprises the steps of adding a plurality of circular carrier DNA molecules to the circular nucleic acid product; fragmenting the circular nucleic acid product and the carrier DNA molecules to produce the template molecule and a plurality of linear carrier molecules; determining the efficiency of the fragmentation from the template molecule and the linear carrier molecules; amplifying the template molecule to produce a population comprising a plurality of substantially identical copies, wherein the linear carrier molecules are un- amplifiable; and sequencing the population to produce sequence data comprising the sequence composition of the template nucleic acid.
- the methods of the invention may be performed simultaneously on a plurality of target DNA fragments to produce a library of DNA constructs which contain the ends from a large fragment of DNA.
- One advantage of the invention is that a library may be constructed in vitro without the use of prokaryotic or eukaryotic host cells.
- the present invention is directed to a method for obtaining a DNA construct comprising two end regions of a target nucleic acid in an in vitro reaction comprising the steps of - fragmenting a nucleic acid to produce a target nucleic acid molecule;
- the nucleic acid that is being fragmented may consist of very large molecules.
- said nucleic acid may be genomic DNA, which has not been substantially sheared or otherwise prefragmented previously.
- the new method is especially suitable for target nucleic acid molecules comprising a length selected from the group consisting of at least 3Kb, at least 8Kb, at least 10Kb, at least 20Kb, at least 50Kb, and at least 100Kb.
- Cre recombinase A prominent example for a site specific recombinase that can be used in the context of the present invention is Cre recombinase.
- a preferred method for of fragmenting the circular nucleic acid product comprises the step of nebulization.
- the step of fragmenting the circular nucleic acid product further comprises a first break of the circular nucleic acid product using a type II restriction enzyme and a second break using the nebulization, wherein the type II restriction enzyme cuts at a restriction site in a hybrid adaptor region of the circular nucleic acid product and produces a short sequence region from the target nucleic acid and the nebulization produces a long sequence region from the target nucleic acid.
- the type II restriction enzyme comprises
- Mmel and the short sequence region comprises a 20bp sequence length.
- the method further comprises the step of removing the non-circular molecules.
- the non-circular molecules comprise the linear nucleic acid product and an adaptor dimer product, wherein the adaptor dimer product is generated from a ligation of two of the recombination adaptor elements to each other.
- the method further comprises the step of removing the non-circular molecules using at least one exonuclease. Also preferably, the method further comprises thes steps of
- sequence data comprising the sequence composition of the template nucleic acid.
- the circular carrier molecules comprise pUC19. Also particularily preferred, the circular carrier molecules comprise damaged DNA wherein the damaged DNA is un-amplifiable. The damaged DNA may then a type of damage selected from the group consisting of UV damage, alkylation/methylation, X-ray damage, hydrolysis, and oxidative damage.
- inventive method further comprises the steps of:
- the method further comprises the step ligating a second set of adaptor elements to the template nucleic acid molecule, wherein the second set of adaptor elements comprise a first primer element and a second primer element and wherein the step of amplifying employs the first primer element and the step of sequencing employs the second primer element.
- the sequence composition of the template nucleic acid comprises a sequence composition for each of the sequence regions from the ends of the target molecule.
- the recombination adaptor elements comprise a first recombination adaptor element and a second recombination adaptor element, wherein the first and second recombination adaptor elements both comprise a directional element.
- the circular nucleic acid product and the linear nucleic acid product are produced when the directional elements in the first and second recombination adaptor elements are in an identical directional relationship.
- the first and second recombination adaptor elements may then each comprise a blunt end that ligates to the target nucleic acid molecule in an orientation that promotes the identical directional relationship of the directional elements.
- the first and second recombination adaptor elements comprise an overhang end that inhibits formation of adaptor concatemers.
- the directional element comprises a lox sequence element.
- the first and second recombination adaptors elements comprise a palindromic sequence element flanking both ends of the directional element.
- the circular nucleic acid product comprises a first hybrid recombination adaptor and the linear nucleic acid product comprises a second hybrid recombination adaptor, wherein the first and second hybrid recombination adaptors comprise elements from the ligated recombination adaptors.
- the template nucleic acid comprises the first hybrid recombination adaptor positioned between the end sequence regions.
- the template nucleic acid comprises at least one enrichment tag associated with the first hybrid recombination adaptor.
- Said enrichment tag may for example be a Biotin tag.
- the present invention is directed to a method for obtaining a plurality of DNA constructs comprising two end regions of a target nucleic acid in an in vitro reaction comprising the steps of:
- kits for performing the methods disclosed above comprising:
- the site specific recombinase is preferably Cre recombinase.
- such a kit may comprise
- Figure 1 depicts a schematic representation of one embodiment of the paired-end sequencing strategy.
- the numeric labels indicate the origin of the nucleic acids.
- "101" denotes one flanking region of the capture element, shown for example, on the left side of Figure 3 A.
- "102” denotes a second flanking region of the capture element, shown for example, on the right side of Figure 3 A.
- "103” denotes the capture element.
- "104” denotes fragmented (and optionally size fractionated) starting nucleic acid.
- "105” denotes a separator element.
- "106” denotes polymerase.
- Figure 2 depicts a schematic representation of a second embodiment of the paired-end sequencing strategy.
- FIG. 3 depicts the sequence and design of capture fragments. The identities of the sequences are as follows:
- Oligo 4 SEQ ID NO:5 Paired-end capture fragment product (type IIS, Mmel) SEQ ID NO:6
- Short adaptor paired end capture fragment (type IIS, Mmel) SEQ ID NO: 8
- Figure 4 depicts one embodiment of a RE fragment.
- Figure 5 depicts another embodiment of a RE fragment.
- Figure 6 depicts a paired end read approach using a hairpin adaptor.
- the hairpin adaptor has the following sequence:
- the hairpin adaptor is one continuous nucleic acid sequence, which is depicted as separated into 4 regions above. The four regions are, from left to right, the hairpin region, restriction endonuclease recognition site, a biotinylated region, and a type IIS restriction endonuclease recognition site.
- "601" denotes the hairpin adaptor.
- "603” denotes genomic DNA. Met denotes methylated DNA.
- Met denotes methylated DNA.
- “602” denotes hairpin adaptor dimers.
- “604" denotes hairpin adaptor cleaved by restriction endonuclease.
- “605" denotes two hairpin adaptors cleaved by restriction endonuclease and religated.
- SA denotes streptavidin bead.
- Bio denotes biotin (e.g., biotinylated DNA).
- Figure 7 depicts improvements to a paired end procedure.
- Figure 8 depicts a paired-end read approach with overhang adaptor.
- Figure 9 depicts "tag primed" double-ended sequencing, which is one method for sequencing the products of the invention.
- Figure 10 depicts adaptor linked circularization.
- Figure 11 depicts ssDNA based circularization.
- Figure 12 depicts a schematic representation of another embodiment of the paired- end sequencing strategy - Paired-Reads PET Random Fragmentation.
- SPRI refers to solid-phase reversible immobilization.
- Figure 13 depicts Paired-Reads PET Random Fragmentation sequencing data from sequencing E. CoIi Kl 2.
- Figure 14 depicts various methods of double stranded DNA cleavage by E. coli Endonuclease V.
- the boxed nucleotides "I" represent deoxyinosine.
- Figure 14 A depicts a method in which the nucleotide sequence of the double- stranded DNA directs double-stranded cleavage by E. coli Endonuclease V in a manner which results in a 3' single-stranded palindromic overhang. Note that 3 ' single-stranded overhangs contain a Deoxyinosine residue.
- Figure 14 B depicts a method in which the nucleotide sequence of the double- stranded DNA directs double-stranded cleavage by E. coli Endonuclease V in a manner which results in a 3 ' single-stranded non-palindromic overhang. Note that 3' single-stranded overhangs contain a Deoxyinosine residue.
- Figure 14 C depicts a method in which the nucleotide sequence of the double- stranded DNA directs double-stranded cleavage by E. coli Endonuclease V in a manner which results in a 5' single-stranded palindromic overhang. Note that 5' single-stranded overhangs do not contain a Deoxyinosine residue.
- Figure 14 D depicts a method in which the nucleotide sequence of the double- stranded DNA directs double-stranded cleavage by E. coli Endonuclease V in a manner which results in a 5' single-stranded non-palindromic overhang. Note that 5' single-stranded overhangs do not contain a Deoxyinosine residue.
- Figure 14 E depicts a method in which the nucleotide sequence of the double- stranded DNA directs double-stranded cleavage by E. coli Endonuclease V in a manner which results in a blunt end.
- Figure 15 depicts a schematic representation of another embodiment of the paired-end sequencing strategy with double-stranded cleavage by E. coli Endonuclease V of a hairpin adaptor containing Deoxyinosines on opposing strands (Deoxyinosine Hairpin Adaptor).
- Figure 16 depicts the distribution of Paired-Read distances obtained from sequencing of E. coli Kl 2 genomic DNA using the Deoxyinosine Hairpin Adaptor method depicted in Figure 15.
- Figure 17 depicts a schematic representation of another embodiment of the paired end
- Nucleotide sequences of the hairpin adaptor, the paired end adaptors ("A” and “B") and the PCR primer “F-PCR” and “R-PCR” are shown in Figure 18.
- Each of the paired end adaptors has double-stranded and single-stranded portions as shown in Figure 18.
- Bio denotes biotin.
- Metal denotes a methylated base.
- SA-beads denotes streptavidin-coated microparticles.
- EncoRI and “Mmel” denote recognition sites for the restriction endonucleases EcoRI and Mmel, respectively.
- Figure 18 depicts the nucleotide sequences and modifications of the adaptor and primeroligonucleotides shown in Figure 17.
- Figure 18 A depicts the hairpin adaptor sequence.
- iBiodT denotes internal biotin- labeled deoxythymine.
- Bio denotes biotin.
- EcoRI and “Mmel” denote recognition sites for the restriction endonucleases EcoRI and
- Figure 18 B depicts the paired end adaptor and PCR primer nucleotide sequences.
- Each of the paired end adaptors (“A” and “B") is produced by annealing of two single stranded oligonucleotides, "A top” and “A bottom", “B top” and “B bottom”.
- the 5' ends of the polynucleotide sequences shown in Figure 18 B are not phosphorylated.
- Figure 19 depicts a schematic representation of one embodiment of a method for polynucleotide ligation in water-in-oil emulsion.
- Figure 20 depicts a graph of the depth of coverage of E. coli Kl 2 genomic DNA achieved by paired end sequencing data obtained with or without Mmel-site containing carrier DNA.
- Figure 21 depicts a schematic representation of one embodiment of a method for a recombination based paired end strategy.
- Figure 22 depicts one embodiment of adaptors useful for the recombination based strategy of Figure 21 and an adaptor product generated therefrom.
- SEQ ID Nos 57-64 are depicted herein, in order of appearance.
- Figure 23 depicts a schematic representation of the products of the recombination based strategy of Figure 21 based upon adaptor directionality.
- Figure 24 depicts the distribution of Paired-Read distances obtained from sequencing of E. coli Kl 2 genomic DNA based, at least in part, upon the recombination based method depicted in Figure 21.
- Figure 25 depicts a schematic representation of the advantage provided from sequence information generated from long paired end fragments produced using the recombination based method of Figure 21.
- the invention is directed to a fast and cost effective method for isolating and sequencing both ends of a large fragment of nucleic acid.
- the method is fast and amenable to automation and allows the sequencing and linkage of large fragments of DNA.
- Paired end sequencing holds a number of important advantages compared to conventional clone-by-clone shotgun sequencing, and is in fact complementary to it. Foremost among these advantages is the ability to quickly produce a scaffolding of a large genome even when the genome is interspersed with repetitive elements.
- the method of the invention can be used to produce a library of DNA fragments from an in vitro reaction wherein the fragments contain the ends from a larger fragment of DNA. Even further, the method of the invention can be used to assemble an entire genomic scaffold using a minimal sequencing effort by employing a paired distance spacing between those ends that is at least 10kb or larger.
- paired-end sequencing may be performed in the following steps:
- the starting material may be any nucleic acid including, for example, genomic DNA, cDNA, RNA, PCR products, episomes and the like. While the methods of the invention are especially effective for long stretches of nucleic acid starting material, the invention is also applicable to small nucleic acids such as a cosmid, plasmid, small PCR products, mitochondrial DNA etc.
- the DNA may be from any source.
- the DNA may be from the genome of an organism whose DNA sequence is unknown, or not completely known.
- the DNA may be from the genome of an organism whose DNA sequence is known. Sequencing the DNA of a known genome allows researchers to gather data on genomic polymorphisms and to correlate genotype with disease.
- the nucleic acid starting material may be of a known size or known range of sizes.
- the starting material may be a cDNA library or a genomic library where the average insert size and distribution is known.
- the nucleic acid starting material may be fragmented ( Figure IA) by any one of a number of commonly used methods including nebulization, sonication, HydroShear, ultrasonic fragmentation, enzymatic cleavage (e.g., DNase treatment, including limited DNase treatment, RNase treatment (including limited RNase treatment), and digestion with restriction endonucleases), prefragmented library (such as in a cDNA library), and chemical (e.g., NaOH) induced fragmentation, heat induced fragmentation, and transposon mediated mutation - which can introduce cleavage sites such as restriction endonuclease cleavage sites throughout a DNA sample. See, Goryshin I.Y.
- the DNA ends may require polishing. That is, the double stranded DNA ends may need to be treated to make them blunt ended and suitable for ligation. This step will vary in an art known manner depending on the fragmentation method. For example, mechanical sheared
- DNA can be polished using Bal31 to cleave the sequence overhangs and a polymerase such as klenow, T4 polymerase , and dNTPs may be used to fill in to produce blunt ends.
- a polymerase such as klenow, T4 polymerase , and dNTPs
- the nucleic acid fragments may be size fractionated to reduce this size variation.
- Size fractionation is an optional step that may be performed by a number of art known methodologies. Methods for size fractionation include gel methods such as pulse gel electrophoresis, and sedimentation through a sucrose gradient or a cesium chloride gradient, and size exclusion chromatography (gel permeation chromatography). The choice of selected size range will be based on the length of the region to be spanned by paired-end sequencing.
- size fractionated DNA fragment has a size distribution, which is within 25% of each other.
- a 5 Kb size fraction would comprise fragments which are 5 Kb +/ - 1 kb (i.e., 4 Kb to 6 Kb) and a 50 Kb size fraction would comprise fragments which are 50 Kb +/ - 10 kb (i.e., 40 Kb to 60 Kb).
- a capture element is a linear double stranded nucleic acid - which may have single stranded ends or double stranded ends used for ligating the nucleic acid fragment from the previous step.
- a "capture element” may be propagated as a circular nucleic acid (e.g., a plasmid as depicted in Figure 1C) which contains forward and reverse adaptor ends (depicted in Figure 1C as a thick region of the circle). This circular plasmid may be cleaved before the capture element is used.
- These adaptor ends contain nucleic acid sequences that can serve as hybridization sites for potential PCR primers and sequencing primers in subsequent steps.
- the capture element may comprise additional elements such as restriction endonuclease recognition and/or cleavage sites, antibiotic resistance markers, prokaryotic or eukaryotic origins of replication or a combination of these elements.
- antibiotic resistance markers include, without limitation, genes imparting resistance to ampicillin, tetracycline, neomycin, kanamycin, streptomycin, bleomycin, zeocin, chloramphenicol, among others.
- Prokaryotic origins of replication can include, among others, OriC and OriV.
- Eukaryotic origins of replication can include autonomously replicating sequences (ARS), but are not limited to these sequences.
- the capture element may contain restriction endonuclease recognition and/or cleavage sites (e.g., unique and rare sites are preferred) that can be used to digest subsequent nucleic acid products (step L) into small amplifiable (by PCR) fragments.
- Capture elements can also comprise markers or tags, such as biotin, for easy purification or enrichment of the nucleic acid for paired-end sequencing.
- the capture element is linearized using known techniques such as restriction endonuclease digestion (blunt or sticky ends can be used for different fragment preparation; see below and Figure 1 D).
- restriction endonuclease digestion blue or sticky ends can be used for different fragment preparation; see below and Figure 1 D.
- the capture element can be dephosphorylated or modified with topoisomerase for TA cloning.
- the capture element is ligated to the fragment (or size fractionated fragment) of step A or B to form a circular nucleic acid comprising one capture element and one fragment of the target DNA ( Figure 1 E).
- the capture element and the target DNA are joined by well-known methodologies such as ligation by DNA ligase or by topoisomerase cloning strategies.
- the result of the previous step yields a collection of capture elements ligated to a DNA fragment which can be of considerable size.
- the present step is used to delete a large internal region of the target DNA fragment to yield a cloned insert of a size that can be more amenable for automated DNA sequencing ( Figure 1 F).
- the captured genomic DNA (i.e., the circular nucleic acid produced by step E) is digested with one or more restriction endonucleases which may have one or more cleavage sites within the genomic DNA.
- any restriction endonuclease may be used for "internal cleavage" as long as the restriction endonuclease does not cut within the capture element.
- Internal cleavage refers to the cleavage that is internal to the target DNA and which does not cut the capture element.
- Internal cleavage restriction enzymes may be selected by designing the capture element so that it does not contain the cleavage sites of selected restriction endonucleases. Restriction endonucleases and their use are well known in the art and can readily be applied to the present method.
- a combination of multiple restriction enzymes, each restricted to internal cleavage may be employed to further reduce the size of the target DNA fragment.
- the genomic DNA is cut by one or more of these restriction endonucleases to within 50 to 150 bases of the capture element.
- a "separator element” which is a double stranded nucleic acid of known sequence is ligated between the ends of the digested genomic material of the previous step to form a circular nucleic acid ( Figure 1 G).
- This "separator element” serves two purposes. First, the separator element can comprise a priming site for rolling circle amplification of the minicircles (see below, step I). Second, since the sequence of the separator element is known, it can act as an identifier that marks the ends of the paired genomic ends (to enable trimming and easy software analysis of the linked ends). That is, during subsequent sequencing of the genomic fragment, the sequence of the separator element would signal that the entire genomic fragment has been sequenced.
- Such separator elements can also comprise additional elements such as restriction endonuclease recognition and/or cleavage sites, antibiotic resistance markers, prokaryotic or eukaryotic origins of replication or a combination of these elements.
- additional elements such as restriction endonuclease recognition and/or cleavage sites, antibiotic resistance markers, prokaryotic or eukaryotic origins of replication or a combination of these elements.
- the optional presence of such elements as antibiotic resistance markers and origins of replication notwithstanding, one of the advantages of the methods of the present invention is that said methods do not require the use of host cells (e.g. E. coli) for the cloning, amplification or other manipulations of nucleic acids.
- the separator element can also be biotinylated or otherwise tagged with a marker or a tag for easy purification or enrichment of the nucleic acid for paired-end sequencing.
- the circular nucleic acid (i.e., minicircle) produced from the last step is rendered single-stranded to result in a single stranded nucleic acid. This is done using standard DNA denaturing techniques by changing salt, temperature or pH of the solution. Other DNA denaturing techniques are known to one of skill in the art. After denaturing, the DNA circles from the same minicircle may still be linked but this does not affect the methods of the invention ( Figure 1 H). Step II
- a primer is annealed to the separator element which comprises sequence that can anneal to the primer.
- This separator sequence thus acts as initiator for rolling circle amplification ( Figure 1 I).
- Step IJ The sample is amplified by rolling circle amplification to generate long single- stranded products ( Figure 1 J).
- One advantage of this rolling circle amplification step is that elements without a separator element will not amplify and elements that are not closed circles will amplify poorly.
- One or more capping oligos are annealed to single-stranded restriction sites that flank the forward and reverse adaptor (rendering them double stranded in these regions) ( Figure 1 L).
- the capping oligos may be complementary to at least part of the capture element, to at least part of the adaptor regions, or both.
- the capped single-stranded DNA is cut at the capped sites into small fragments (Fig 1 M). These small fragments which have ends of known sequence and can be easily amplified using conventional amplification techniques such as PCR.
- paired-end sequencing may be performed in the following steps:
- Step 2A Fragmentation of sample DNA
- the fragmentation of target nucleic acid and size fractionation is the same as for the previous embodiment.
- Step 2B Methylation and end polishing.
- the fragmented target nucleic acid may be methylated by any methylase.
- Preferred methylase would be those that influence restriction endonuclease digestion.
- Methylases may be used in at least two different strategies, hi one preferred embodiment, methylases enable cleavage by restriction endonucleases that cleave only at a methylated restriction site. In another preferred embodiment, methylases prevent cleavage by restriction endonucleases that only cleave unmethylated DNA.
- the step of end polishing is the same as described in the first method.
- an adaptor is ligated to the ends of the target nucleic acid fragments ( Figure 2, 1.) to produce a fragment with an adaptor at both ends.
- the adaptors may be of any size but a size of 10 to 30 bases is preferred, and a size of 12 to 15 bases is more preferred.
- the adaptors may comprise a blunt end and an incompatible sticky end (i.e., an end with a 5' overhang or 3' overhang).
- the sticky ends may be filled in with polymerase and dNTPs.
- the adaptors of this section may be a capture fragment. Examples of capture fragments are shown in Figures 4 and 5.
- the adaptors may be hairpin adaptors (Figure 6A).
- Figure 6A The use of hairpin adaptors (e.g., Figure 6) prevents concatemer formation because hairpin adaptors cannot form any multimers greater than a dimer.
- Another method for preventing concatemers is to use adaptors where the 5' end of one or both strands is not phosphorylated.
- adaptors that may be used include non-phosphorylated adaptors which have the advantage of using fewer processing steps but which also requires a phosphorylation step using a kinase.
- the adaptors may be methylated, or biotinylated or both.
- DNA fragments which are ligated to two hairpin adaptors may be purified using exonucleases.
- This exonuclease purification takes advantage of the fact that a double stranded DNA, ligated to a hairpin adaptor on both ends, is a DNA molecule without an exposed 5' or 3' end.
- Other DNAs in the ligation mixture such as a double stranded DNA fragment ligated to only one hairpin adaptor, an unligated DNA fragment and unligated adaptors are susceptible to exonucleases ( Figure 6 B).
- exposure of the ligation mixture to an exonuclease will remove most DNA except for DNA fragments ligated to two hairpin adaptors and hairpin adaptor dimers. Since the hairpin adaptor dimers are significantly smaller that the hairpin adaptor dimers are significantly smaller that the hairpin adaptor dimers.
- DNA fragments they can be removed using known techniques such as a size fractionation column (e.g., spin column) or agarose or acryl amide gel electrophoresis, or one of the other polynucleotide size discriminating methods known in the art and/or discussed elsewhere in this disclosure.
- a size fractionation column e.g., spin column
- agarose or acryl amide gel electrophoresis e.g., agarose or acryl amide gel electrophoresis
- the adaptors may be biotinylated to facilitate isolation/enrichment of tag carrying fragments.
- fragments containing the adaptor may be purified by annealing a capture oligonucleotide, complementary to the tag sequence, to the fragments. Step 2E -preparation of fragments for circularization
- the fragment is circularized.
- cleavage in the adaptor regions may be desirable for a number of reasons. For example, if hairpin adaptors are used, the DNA fragment will not self circularize because there are no free 5' or 3' ends. As another example, if the adaptors leave the DNA fragment with blunt ends, cleavage would allow the adaptors to have 5' or 3' overhangs and these overhangs (so called "sticky ends") greatly facilitate ligation efficiency. Furthermore, digestion in the adaptor region would allow selection of DNA fragments with two adaptors, one ligated at each end. This is because the adaptors can be designed such that cleavage with a restriction endonuclease would leave compatible sticky ends. After cleavage in the adaptor region, DNA fragments with only one adaptor (an undesirable species) would have one sticky end and one blunt end and would have difficulty in self circularization. Thus, only DNA fragments with adaptors at both ends would be circularized.
- Limiting cleavage to the adaptors may be accomplished with a number of methods.
- the adaptors are methylated and is ligated to unmethylated DNA. Then the construct is digested with a restriction endonuciease which only cleave methylated DNA. Since only the adaptors are methylated, only the adaptors will be cleaved.
- the DNA fragments may be methylated and the adaptors are not methylated. Cleavage with a restriction endonuclease which only recognize and cleave unmethylated DNA will limit cleavage to the adaptors. This may be accomplished by using starting DNA which is already methylated, or by in vitro methylation.
- digestion of the adaptors is not required.
- digestion of the adaptors may be optional.
- DNA fragments may be treated to facilitate ligation/circularization.
- the blocking group may be removed or the phosphates may be added to make the fragment ready for ligation.
- Step 2F ligation of ends to form circularized fragment.
- a number of methods may be used for circularization.
- ligase is added to the reaction mixture with the appropriate ligase buffer and the DNA fragments are allowed to recircularize.
- ligations are performed at dilute DNA concentrations to promote self ligation and to discourage the formation of concatemers.
- ligations are performed in water-in-oil emulsions, wherein the aqueous droplets contain approximately one fragment to be circularized, as described elsewhere in this disclosure.
- a signature tag is ligated to the target nucleic acid fragment and the fragment is self circularized (see figure 2).
- the signature tag is a double stranded nucleic acid sequence of between 24 to 30 basepairs.
- This "signature tag” is similar to the “separator element" of the previous embodiment in that it can act as an identifier that marks the ends of the paired genomic ends (to enable trimming and easy software analysis of the linked ends).
- the sequence of the signature tag signals the boundary between the two ends of the target nucleic acid sequence.
- the target nucleic acid fragment is further digested or fragmented. Fragmentation may be performed using any fragmentation procedure listed in this disclosure. See, for example, Step IA above. Alternatively, one or more restriction endonucleases may be used to digest the target DNA to produce fragments.
- a nebulizer is used to fragment the nucleic acids until the average fragment size is about 200 to 300 bps. As shown in Figure 2, some of these fragments would contain a signature tag while other fragments would not contain a signature tag.
- nucleic acid fragments may be sequenced using standard techniques. Methods for sequencing nucleic acid fragments are known. One preferred method of sequencing is described in International Patent Application No. WO 05/003375 filed January 28, 2004. Step 2H
- fragments containing the signature tag may be enriched from fragments without signature tags.
- One method for enrichment involves the use of biotinylated signature tags in the sample preparation step. After fragmentation, fragments that contain the signature tag would be biotinylated and may be purified using a streptavidin column or streptavidin beads in solution.
- nucleic acid fragments may be sequenced using standard techniques including automated techniques such as those described in International Patent Application No. WO 05/003375, filed January 28, 2004.
- Paired end sequencing may be performed by a third method.
- step A to step E may be performed as described in the second method (i.e., as steps 2A to 2E).
- each adaptor comprises a type IIS restriction endonuclease site which can direct DNA cleavage about 15 to 25 bps away from the restriction endonuclease recognition site. It is known that different type IIS restriction endonucleases cut at various distances from the endonuclease recognition site and the use of different type IIS restriction endonucleases to adjust this distance is contemplated. Step 3F ligation of ends to form circularized fragment.
- Step 3 F may be performed according to the second method (step 2F) with the exception that a signature tag is not used (See figure 6D).
- an exonuclease may be used following ligation to remove non-circularized fragments and to reduce the presence of concatemerized fragments. Since a properly recircularized DNA fragment has no exposed 5' or 3' ends, it is resistant to exonuclease digestion. Further, a concatemer, being larger, would have a higher chance of having exposed 5' or 3' ends due to nicks. Exonuclease treatment would also remove these concatemers with nicks.
- the circularized DNA may be amplified by rolling circle amplification.
- an oligonucleotide may be used to hybridize to one strand of the recircularized DNA.
- This oligonucleotide primer is extended with a polymerase. Since the template is a circle, the polymerase will generate a single stranded concatemer having multiple repeats of the target DNA.
- This single stranded concatemer may be made double stranded by hybridizing a second primer to it and elongating from this second primer. For example, this second primer may be complementary to the adaptor sequence of this single stranded concatemer).
- the resulting double stranded concatemer may be used directely for the next step.
- Step 3G Digestion/fragmentation of DNA.
- the circularized nucleic acid or the concatemerized nucleic acid from rolling circle amplification is digested with a Type IIS restriction endonuclease
- each adaptor contains at least one type IIS restriction endonuclease cleavage site.
- a type IIS restriction endonuclease will recognize the type IIS restriction endonuclease cleavage site on the adaptor and cleave the nucleic acid about 10 to 20 basepairs away. Examples of type IIS restriction endonuclease include Mmel (about 20 bp), EcoP151 (25 bp) or Bpml
- This step will produce short fragments (10 to 100 bp) of DNA comprising two ends of a larger DNA fragment, with an adaptor region between the two ends ( Figure 6E).
- An alternative method for producing the same structure is to randomly fragment the circularized nucleic acid using any of a number of DNA fragmenting methods as described in elsewhere in this disclosure (e.g., as described in step IA). This would allow fragments of any size (100 bp, 150 bp, 200 bp, 250 bp, 300 bp or more) to be made.
- DNA comprising adaptor regions may be selectively purified using a solid support with an affinity for biotin such as, for example, streptavidin beads, avidin beads, BCCP beads and the like.
- Step 3H Sequencing
- Manual sequencing by such methods as Sanger sequencing or Maxam-Gilbert sequencing is well known.
- Automated sequencing may be performed, for example, by using the automated sequencing method as the 454 SequencingTM developed by 454 Life Sciences Corporation (Branford, CT) which is also described in application WO/05003375 filed January 28, 2004 and in copending US patent applications USSN: 10/767,779 filed January 28, 2004, USSN: 60/476,602, filed June 6, 2003; USSN: 60/476,504, filed June 6, 2003; USSN: 60/443,471, filed January 29, 2003; USSN: 60/476,313, filed June 6, 2003; USSN: 60/476,592, filed June 6, 2003; USSN: 60/465,071, filed April 23, 2003; and USSN: 60/497,985; filed August 25, 2003.
- one sequencing adaptor may be ligated to one end of the DNA fragment and a second sequencing adaptor (sequencing adaptor B) may be ligated to a second end of the DNA fragment.
- the DNA fragment may be purified away from any unligated sequencing adaptors by binding the biotin to a solid support.
- the isolated nucleic acid fragments may be placed in individual reaction chambers and further amplified by PCR using primers specific for sequencing adaptor A and sequencing adaptor B. By attaching a biotin moiety to either A or B adaptor single stranded DNA which preferentially consists of the A-B fragments can be isolated.
- This amplified nucleic acid may be sequenced using sequencing primers specific for sequencing adaptor A, sequencing adaptor B or a sequencing primer specific for the adaptor (e.g., hairpin adaptor) located in between the two ends.
- Paired end sequencing may be performed using a variation of the above described method called Paired-Reads PET Random Fragmentation as outlined in Figure 12. Results from an experiment according to this fourth method are depicted in Figure 13.
- steps A to step D may be performed as described in the second method or third method (i.e., as steps 2 A to 2D or steps 3 A to 3D).
- step 4D may be performed using SPRI (solid-phase reversible immobilization) to purify exonuclease treated fragments.
- the nucleic acid fragments in Figure 12 are ligated to biotinylated primers and can be purified for example using streptavidin, avidin, reduced affinity streptavidin or reduced affinity avidin coated beads.
- Step 4E may be performed as described in step 2E or step 3E.
- Step 4F may be performed as described in step 3F.
- the linear DNA fragment generated in the last step may be circularized using any known method of circularization as described above for steps 2F or step 3F.
- an optional enrichment step may be performed to enrich for circular nucleic acids.
- nucleic acids that are not circularized may be removed by anexonuclease which degrade nucleic acids with free ends.
- Covalently closed circular nucleic acids do not have free ends and are resistant to exonuclease attack. Because of this, treatment with an exonuclease would enrich for circular nucleic acid while removing linear nucleic acids.
- fragmentation may be performed using any fragmentation procedure listed in this disclosure.
- One preferred method is to fragment the circular nucleic acids using mechanical shearing. Mechanical shearing may be performed for example, by vortexing, by forcing nucleic acid in solution through a small orfice, or other similar procedure described elsewhere in this disclosure.
- mechanical shearing is that nucleic acids of different lengths may be produced (See nucleic acid after step G in Figure 12).
- DNA fragments without an adaptor region in the middle are also produced. See. Figure 12.
- the adaptor region is biotinylated
- DNA comprising adaptor regions may be selectively purified using a solid or semi-solid support with an affinity for biotin such as, for example, streptavidin beads, avidin beads, BCCP beads and the like.
- the product of method 4 may be sequenced using any manual or automatic method available. Such methods are described in detail in Step 3 H above.
- Paired-Read PET Random Fragmentation offers a number of advantages.
- Figure 13 depicts E. CoIi Kl 2 genomic DNA sequenced using Method 4. As can be seen, significantly longer read length distributions, from less than 50 to about 400, are possible using this method. Further, fragment lengths of about 3 kb can be produced and their ends sequenced. This shows that method 4 provides superior gap closure performance compared to the other methods. Fifth Method
- Paired end sequencing may be performed using a variation of the above described methods as outlined in Figure 15.
- the adaptor can be designed as a Deoxyinosine Hairpin Adaptor which incorporates deoxyinosine nucleotides (herein also referred to as Inosines) on opposite strands of the double-stranded region of the hairpin.
- E. coli Endonuclease V introduces a single-stranded cut (nick) between the 2 nd and 3 rd nucleotide 3' from an inosine nucleotide.
- the relative placement of the Inosines in the hairpin adaptor determines whether a 3' single stranded overhang ( Figure 14 A and Figure 14 B), a 5 'single stranded overhang ( Figure 14 C and Figure 14 D), or a blunt end (no overhang) ( Figure 14 E), will be generated upon EndoV cleavage of both strands.
- the sequence of the hairpin adaptor can also be designed to produce a non-palindromic ( Figure 14 A and B) or palindromic ( Figure 14 A and C) single stranded overhang upon EndoV cleavage.
- the adaptor may contain a Type IIS restriction endonuclease recognition site (such as Mmel) as discussed elsewhere in this disclosure.
- Step 5A Figure 15 step A
- step A may be performed substantially as described for Step IA.
- the target DNA can be fragmented by any of the physical or biochemical methods known in the art, as described above.
- the resulting fragments may be size-fractionated by any of the size-fractionation methods described elsewhere in this disclosure.
- the ends of the target DNA may be polished by any of the polishing methods described herein, and can be ligated to Deoxyinosine Hairpin Adaptors described above to form adaptor tagged target DNA.
- Step 5D Figure 15 step D
- the ligation reaction may be treated with one or more exonucleases (as discussed elsewhere herein) and size fractionated by any of the methods described herein to enrich the desired reaction products.
- Step 5E Figure 15 step E
- the adaptor tagged target nucleic acids are cleaved with EndoV.
- Conditions for the cleavage reaction may be any of the conditions described by Yao et al (Y ao M and KowYW, J Biol Chem. 1995, 270(48):28609-16; Yao M and Kow YW, J Biol Chem. 1994, 269(50):31390-6; Yao M et al., Ann N Y Acad Sci. 1994, 726:315-6; and Yao M et al., J Biol Chem. 1994, 269(23): 16260-8). The skilled artisan will appreciate that similar conditions can also be used.
- steps F to H may be performed as described in the second, third, or fourth method (i.e. as steps 2F to H or steps 3F to H or steps 4F to H).
- the Deoxyinosine Hairpin Adaptors of the fifth method are advantageous because EndoV will only cleave in the presence of Inosine or certain sites of damage or base mispairing in DNA. Therefore, the target nucleic acid will not be cleaved by the EndoV treatment. Thus, as the EndoV sites are unique to the adaptors, the target DNA need not be protected by methylation as in some above described embodiments. The elimination of the methylation step saves time, and problems related to incomplete methylation of the target DNA are eliminated. Furthermore, the EndoV digestion is very rapid as compared to the EcoRI digestion, therefore shortening the time required to perform the method.
- paired-end sequencing may be performed by methods comprising some or all of the following steps, as depicted in Figures 17 and 18.
- Step 6A - Fragmentation of target DNA Figure 17 A
- the polynucleotide molecules of the target DNA sample such as genomic DNA
- the fragments range from about 1.5 to about 5 kb in length.
- the fragmentation can be accomplished by any of the physical and/or biochemical methods described elsewhere in this disclosure.
- the target DNA is randomly sheared by physical force, for example by use of a HydroShear® apparatus (Genomic Solutions).
- the sheared DNA may then be purified with regard to the desired fragment size.
- This optional size selection may be achieved through any of the size selection methods known in the art and disclosed herein, such as electrophoresis and/or liquid chromatography.
- the sheared DNA sample is selected for size by purification on SPRI ® size exclusion beads (Agencourt; Hawkins, et. al., Nucleic Acids Res. 1995 (23): 4742-4743).
- SPRI ® size exclusion beads Agencourt; Hawkins, et. al., Nucleic Acids Res. 1995 (23): 4742-4743.
- sequencing the ends (in pairs) of fragments of about 2-2.5 kb can allow for contig ordering in a typical bacterial genome sequencing experiment. Larger fragments may be advantageous for sequencing of the genomes of higher organisms, such as fungi, plants and animals.
- the adaptors may be cut with one or more restriction enzymes in preparation for circularization.
- the target DNA is protected from digestion by modification with the corresponding methylase(s).
- the adaptors are hairpin adaptors, and carry an EcoRI restriction site ( Figure 18 A).
- the EcoRI restriction sites present in the sample DNA fragments are methylated using EcoRI Methylase to preserve their integrity when the EcoRI cohesive ends are generated out of the Hairpin Adaptors, before circularization by ligation.
- any frayed ends may be made blunt and ready for ligation by enzymatically either "filling-in” with a DNA polymerase and/or by "chewing- back" with an exonuclease (e.g. Mung Bean nuclease).
- an exonuclease e.g. Mung Bean nuclease
- some DNA polymerases also have an exonuclease activity.
- the 5' ends of the fragments will be phosphorylated with a polynucleotide kinase.
- T4 DNA polymerase and T4 polynucleotide kinase is used for filling-in and phosphorylation, respectively.
- the T4 DNA polymerase is used to "fill-in" 3 '-recessed ends (5 '-overhangs) of DNA via its 5 '->3' polymerase activity, while its single- stranded 3'->5' exonuclease activity removes 3 '-overhang ends.
- the kinase activity of T4 PNK adds phosphate groups to 5'-hydroxyl termini.
- double-stranded oligonucleotide adaptors are ligated to the ends of the target DNA fragments.
- the adaptors are hairpin adaptors ( Figure 18 A).
- One advantage of hairpin adaptors is that adaptor- adaptor ligation events will only lead to adaptor dimers, i.e. the formation of multimer adaptor concatemers is prevented.
- their hairpin structure will protect the sample fragments from the exonuclease digestion (Step 6 E) used to remove unligated fragments.
- One preferred hairpin adaptor design shown in Figure 18 A contains EcoRI and Mmel restriction sites.
- the EcoRI may be used to create cohesive termini on the ends of each fragment (Step 6 F), allowing for their circularization (Step 6 G), Mmel is a Type Hs restriction enzyme which cuts DNA 20 bp away from its recognition site; it is used to cut into the ends of the circularized sample fragments, generating the Paired End tags to be sequenced.
- Mmel is a Type Hs restriction enzyme which cuts DNA 20 bp away from its recognition site; it is used to cut into the ends of the circularized sample fragments, generating the Paired End tags to be sequenced.
- EcoRI may be replaced by any of a large number of other endonucleases, with concomitant changes in the nucleotide sequence of the adaptor oligonucleotide and use of the appropriate methylase for protection of the target DNA fragments.
- Mmel may be replaced with other Type Hs restriction enzymes, as long as the chosen enzyme cuts at a sufficient distance from its restriction site to generate paired ends of sufficient length to allow downstream sequence assembly.
- the hairpin adaptors are biotinylated, for example at the site shown in Figure 18 A. Other biotinylation sites are also suitable and can be chosen by the skilled artisan.
- the biotin moiety allows for the optional selection of adaptor-containing paired end fragments, and an optional immobilization of the paired end library fragments (after Mmel digestion) during the ligation of the paired end adaptors, during the fill-in reaction (fragment repair), and during the paired end library amplification.
- Step 6 E -Exonuclease Selection ( Figure 17 E)
- an exonuclease digestion follows the ligation of the Hairpin Adaptors, to remove any DNA that is not properly fitted with Hairpin Adaptors at both ends; and purification on SPRI size exclusion beads removes small unwanted molecular species, such as adaptor-adaptor dimers.
- the exonuclease digestion may be performed with one or more of various exonucleases well known in the art.
- the digestion is accomplished with a combination of activities that together allow digestion of single stranded and double stranded DNA, both in the 3' - ⁇ 5' and 5' ⁇ 3' directions.
- the exonuclease mixture contains E.
- coli Exonuclease I (3' ⁇ 5' single strand exonuclease), Phage Lambda Exonuclease (5' ⁇ 3' single and double strand exonuclease) and Phage T7 Exonuclease (5' ⁇ 3' double strand exonuclease, can initiate at gaps and nicks).
- endonucleolytic cleavage by EcoRI is used to create cohesive termini on the ends of each fragment by cutting the hairpin adaptors ( Figure 18 A) and allowing for the fragments' circularization. Digestion with EcoRI will remove the hairpin structures at the ends of the fragments, leaving cohesive ends. The internal EcoRI sites present in the sample DNA are protected by the methylation done earlier in Step 6B.
- the fragments are then circularized by intramolecular ligation of their cohesive EcoRI ends.
- the site of the ligation thus has the two partial Hairpin Adaptors (head to head, with a reconstituted EcoRI site; 44 bp total), flanked on either side by the ends of the sample fragment.
- Another exonuclease digestion is carried out to remove any non-circularized DNA.
- Step 6 H - Mmel digestion ( Figure 17 H) The circularized DNA fragments are then restricted with Mmel.
- This Type Hs restriction enzyme cuts approximately 20 bp away from its restriction site (leaving a 2 nt 3 '-overhang, i.e. the cut is at 20/18 nt; the enzyme also generates some minority products with cuts ranging from 19 to 22 bp from the site).
- Mmel restriction fragments without the ligated "double" hairpin adaptor may optionally be eliminated in this step.
- the library of paired end fragments may be immobilized (and isolated from other Mmel restriction fragments) by binding of the biotin tag present in the hairpin adaptors to streptavidin or avidin beads.
- Step 6 J Paired End Adaptor Ligation
- the ends of the paired end library fragments generated in Step 6 H and optionally purified in Step 6 I are ligated to double stranded adaptors, termed paired end library adaptors or paired end adaptors ( Figure 18 B).
- paired end adaptors provide priming regions to support both amplification and nucleotide sequencing, and may also comprise a short (e.g. 4 nucleotides) "sequencing key" sequence useful for well finding on a 454 SequencingTM System.
- the adaptors may have "degenerate" 2-base single stranded 3' overhangs.
- Degenerate means that the 2 overhanging bases are random, i.e. they may each be either G, A, T, or C. If an enzyme other than Mmel were used, the skilled artisan would be readily able to design paired end adaptors compatible with that other enzyme.
- the exemplary adaptors shown in Figure 18 B are designed to strongly favor the directional ligation to the paired end library fragments with each Adaptor containing a degenerate 2 bp 3 '-overhang at their 3' end which can solely ligate to the ends of the Mmel-generated paired end library fragments (provided the 5' ends of the adaptors are not phosphorylated, see below).
- Adaptors may be combined with the paired end library fragments in a ligation reaction that contains a large molar excess of adaptors (15:1 adaptorfragment ratio), both to maximize utilization of the paired end library fragments and to minimize the potential of forming paired end library fragment concatemers.
- the adaptors themselves may be non- phosphorylated to minimize the formation of adaptor dimers, though as a consequence, the ligation products must be subsequently repaired by a fill-in reaction (Step 6 K)
- nicks may be repaired using a strand-displacing DNA polymerase, whereby the polymerase recognizes the nicks, displaces the nicked strands (to the free 3 '-end of each Adaptor), and extends the strand in a manner that results in the repair of the nicks and in the formation of full-length dsDNA.
- Bst DNA polymerase Large Fragment
- Other strand- displacing DNA polymerases known in the art are also suitable for this step, such as phi29 DNA Polymerase, DNA Polymerase I (Klenow Fragment), or Vent® DNA Polymerase.
- the "adapted" paired end DNA library may be amplified.
- the amplification is performed by PCR, but other nucleic acid amplification methods known in the art and/or described herein may also be used.
- the oligonucleotides F-PCR and R-PCR shown in Figure 18B may be used as PCR primers.
- the "adapted" paired end DNA library is then sequenced.
- individual molecules from the library are sequenced.
- individual molecules from the library may be clonally amplified.
- the clonal amplification is performed by bead emulsion PCR as described in International Patent Application Nos. Wu 2005/003375, WO 2004/069849, WO 2005/073410, each incorporated herein by reference in toto.
- paired-end sequencing may be performed by methods comprising some or all of the following steps, as depicted in Figures 21 - 25.
- the described embodiment provides an especially advantageous, and inventive, process that provides an alternative to circularization by ligation and is amenable for implementation with some or all of the methods and variations described above.
- the presently described embodiment is particularly efficient for generating paired end distances of 10Kb or greater (i.e. for a paired end distance of about
- the described recombination based strategy is also useful for circularizing fragments that are shorter than 10Kb (i.e. for paired end distances of about 3Kb, or 8Kb).
- the presently described embodiment employs an intramolecular recombination based strategy for circularization of nucleic acid molecules that comprise the desired sequence lengths for the greater paired end distances, and provides a substantial advantage in the efficiency for circularization of nucleic acid molecules especially large nucleic acid molecules.
- Some preferred embodiments include what is referred to as an in vitro excision by recombination reaction method that employs a Cre/Lox type Site Specific Recombinase (hereafter referred to as "SSR") system for circularizing a linear adapted target fragment to produce a circular nucleic acid comprising the target fragment and a second excised linear fragment comprising a hybrid adaptor sequence, one example of such a method is illustrated in Figure 21.
- Figure 21 provides an exemplary overview of an SSR based strategy for producing a library of sequencable paired end template nucleic acid molecules that having a pair distance 10Kb or greater.
- Figure 21 illustrates the process of fragmenting genomic or other desired DNA and attaching adaptors 2105 and 2107 producing adapted fragment 2100 that is then selected for a desired length.
- An SSR recombination step is also illustrated that produces circular product 2150 and linear product 2155 from adapted fragment 2100, where circular product 2150 is mechanically sheared to produce a linear paired end template 2160, which is subsequently amplified to produce population 2170 comprising many substantially identical copies of template 2160.
- the raw genomic or other source of polynucleotide molecules of a target DNA sample are fragmented into molecules longer than about 10,000 bases, longer than about 20,000 bases, longer than about 50,000 bases, longer than about 100,000 bases, longer than about 250,000 bases, longer than about 1 million bases, or longer than about 5 million bases.
- the fragments range from about 10Kb to about 50 Kb, to about 100Kb, or to greater than 100Kb in length.
- the fragmentation can be accomplished by any of the physical and/or biochemical methods described elsewhere in this disclosure.
- the target DNA is randomly sheared by physical force, for example by use of a HydroShear® apparatus (Genomic Solutions). Although it will be appreciated that any of the methods of creating fragments described herein may be used if the selected method is capable of producing the desired fragment length.
- the ends of each of the fragments may be polished using by any of the methods described elsewhere in this disclosure, such as for instance the method described in step 6C above.
- blunt ends are preferable for the subsequent adaptor ligation.
- any frayed or overhanging ends may be made blunt and ready for ligation by enzymatically either "filling-in” with a DNA polymerase and/or by "chewing-back" with an exonuclease (e.g. Mung Bean nuclease).
- an exonuclease e.g. Mung Bean nuclease
- some DNA polymerases also have exonuclease activity.
- the 5' ends of the fragments will be phosphorylated with a polynucleotide kinase.
- T4 DNA polymerase and T4 polynucleotide kinase (T4 PNK) is used for filling-in and phosphorylation. respectively.
- the T4 DNA polymerase is used to "fill-in" 3 '-recessed ends (5 '-overhangs) of DNA via its 5'->3' polymerase activity, while its single- stranded 3'->5' exonuclease activity removes 3'-overhang ends.
- the kinase activity of T4 PNK adds phosphate groups to 5'-hydroxyl termini.
- Step 7C Adaptor Ligation
- the adaptors may include loxP adaptors, an example of which is illustrated in Figure 22.
- Figure 22 provides an illustrative example of 2 double stranded adaptor species, loxP-6F adaptor 2105 and loxP-6R adaptor 2107, each having a first blunt end lacking a 5' phosphate, and a second end with a 3' overhang of three sequence positions and a phosphorylated 5' end.
- the described 3 ' overhang is not limited to three sequence positions and that there may be greater or fewer than three depending upon the desired conditions.
- the first blunt end of adaptors 2105 and 2107 are ligated to the polished (i.e. blunt) ends of the target DNA fragments such that the lox P 2200 region in each adaptor is in the same directional orientation in order to promote circularized products as will be described in detail below.
- the second end of both adaptor species comprising the overhang and 5' phosphorylation of each adaptor provides specific advantages.
- the first advantage is the inhibition multimer adaptor formation producing molecules of adaptor concatemers as described above.
- adaptors 2105 and 2107 are ligatable to each other restricting such adaptor ligation events to forming dimers as opposed to long concatemers that are more difficult to distinguish from adapted target molecules and in some cases consume a significant proportion of the adaptor molecules making them unavailable for ligation to target molecules.
- the second advantage is that the 5' phosphorylation and 3' overhang each improve the efficiency of exonuclease degradation, and thus improves removal of uncircularized molecules described in further detail below.
- the adaptor ligated nucleic acid fragments 2100 may be purified with regard to the desired fragment size.
- This optional size selection step may be performed using any of the size selection methods known in the art and disclosed herein, such as electrophoresis and/or liquid chromatography.
- the sheared DNA sample is selected for size by gel electrophoresis as described above.
- gel based methods produce size fractionated DNA fragments comprising a size distribution of lengths with some degree of the desired length such as a range that is 25% of the desired length.
- a targeted 20 Kb size fraction would produce a pool of fragments which are 20 Kb +/ - 5 kb (i.e., produces a range of 15 Kb to 25 Kb fragments lengths).
- alternative size fractionation techniques may be employed, particularly where longer fragments are desired to increase the paired end distance.
- One such technique amenable for size fractionation of larger molecules is generally referred to as "Pulse Field Gel Electrophoresis" (hereafter referred to as PFGE and described by Schwartz DC, Cantor CR. Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis. Cell.
- PFGE enables size fractionation of large sized molecules at far greater resolution than achievable with standard gel electrophoresis methods.
- standard gel electrophoresis methods are generally ineffective at size separating large molecules efficiently, especially nucleic acid molecules with a sequence length of about 20Kb or greater.
- PFGE methods provide accurate size discrimination for such large nucleic acid molecules.
- the linear adapted nucleic acid sequence fragments 2100 are exposed to a site specific recombinase such as the Cre recombinase enzyme that recognizes the 34bp loxP regions 2206 in adaptors 2105 and 2107 ligated to the ends of and flanking the target nucleic acid sequence.
- a site specific recombinase such as the Cre recombinase enzyme that recognizes the 34bp loxP regions 2206 in adaptors 2105 and 2107 ligated to the ends of and flanking the target nucleic acid sequence.
- the Cre recombinase excises a short linear fragment (illustrated in Figure 21 as linear product 2155) comprising a hybrid of the loxP region, and circularizes the target nucleic acid producing a circular molecule (illustrated in Figure 21 as circular product 2150) with a second hybrid loxP region and the target nucleic acid.
- Figures 21 and 23 illustrate both recombination products as linear product 2155 and circular product 2150 generated by Cre recombinase.
- Figure 22 further illustrates the composition of recombined adaptor 2110 with hybrid loxP region 2208 present in circularized product 2150.
- the Cre recombinase enzyme cuts within loxP regions 2206 in both adaptors 2105 and 2107 and recombine to form products with loxP regions that are hybrids of region 2206 from both of the original adaptors 2105 and 2107.
- the Cre recombinase enzyme binds in loxP region 2206 of each of the 6F 2105 and 6R2107 adaptors and cuts each at the same sequence position.
- the bound recombinase/nucleic acid complexes are positioned at each end of the adapted target nucleic acid sequence fragment and react with each other to join the cut ends from the 6F 2105 and 6R 2107 adaptors together thus circularizing the nucleic acid fragment.
- the recombinase enzymes join a segment cut from the 6F 2105 adaptor lacking 8bp directional sequence 2200 with a segment of the 6R 2107 adaptor comprising 8bp directional sequence 2200 resulting in circular product 2150.
- the 8bp directional sequence 2200 element from the 6F 2105 adaptor is joined to the remaining 6R 2107 adaptor lacking the 8bp directional sequence 2200 element resulting in the short hybrid adaptor in linear product 2155 described above.
- the resulting hybrid adaptor in circular product 2150 is illustrated in Figure 22 as adaptor 2110 that comprises loxP region 2208.
- Embodiments of region 2208 comprise a sequence composition that is essentially the same as loxP region 2206.
- region 2208 of adaptor 2110 in circular product 2150 also includes two associated embodiments of enrichment tag 2205 (one tag originating from each of adaptors 2105 and 2107). In some embodiments the presence of two embodiments of enrichment tag 2205 improves the efficiency of subsequent enrichment steps. As illustrated in Figure 22 the enrichment tags may include biotin, however it will be appreciated that any type of enrichment tag described herein (i.e. binding pairs) or generally known in the art may be employed. It will also be noted that adaptor 2110 also includes the blunt ends from the original adaptors 2105 and 2107 ligated to the target DNA fragment in circular product 2150.
- Figures 22 and 23 provide an example of the importance of the directionality of the loxP sites for producing circularized products from the SSR process.
- a wild type version of loxP region 2206 (indicated by box around sequence region) is associated with adaptors 2105 and 2107.
- other mutant variants may be employed as long as the SSR functionality is retained.
- loxP regions possess directionality characteristics and that such characteristics influence the products when exposed to the Cre recombinase.
- region 2206 of both 6F adaptor 2105 and 6R adaptor 2107 comprise features typical to the Cre/Lox system that includes directional loxP sequence 2200 that is 8bp in length (directionality indicated by arrow associated with sequence 2200). Further, region 2206 comprises palindromic sequence elements of about 13bp flanking each side of directional sequence 2200.
- Figure 23 provides an illustrative example of the SSR products generated based upon the relative directional orientation of loxP regions 2206.
- Figure 23A provides a representative illustration of adapted fragment 2100' having two loxP regions 2206 oriented in an opposing directional relationship and the linear inversion product 2305 (indicated by change if position of shaded region 2300) generated by Cre recombinase.
- Figure 23B provides a representation of adapted fragment 2100" having two loxP regions 2206 oriented in the same directional relationship and the products generated by Cre recombinase that include a first circular product 2150 that includes region 2208 (in recombined adaptor 2110 as described above) and a second linear product 2155 excised from adapted fragment 2100 and comprises a second recombined region 2208.
- the recombination reaction of Figure 23B is "bidirectional" as illustrated by the directional arrows where the excision arrow 2334 indicates the greater magnitude of the direction of the reaction as compared to the integration direction indicated by the integration arrow 2336.
- arrows 2334 and 2336 are provided for illustrative purposes only and are not drawn to the exact scale of the actual magnitude of the directionality that may depend, at least in part, on the reaction conditions.
- the reaction conditions are optimized to promote the excision direction and formation of circular products.
- Step 7F Removal of Non-Circular Nucleic Acids
- all of the linear nucleic acid molecules including the excised products 2155, inverted products 2305, adaptor dimers, un-adapted target nucleic acid fragments, etc. can be removed using any of the methods described elsewhere in this specification.
- an exonuclease treatment strategy may be employed to effectively remove all of the linear nucleic acid molecule products or other remaining linear fragments.
- exonuclease it may be desirable to employ more than one type of exonuclease to increase the efficiency of removal of any undesirable linear nucleic acid molecule.
- two or more exonuclease species may be employed which may include, but are not limited to an Exonuclease 1 (may also be referred to as EXO 1) exonuclease species and what is referred to as an ATP Dependent DNAse to digest linear double-stranded DNA (i.e. such as Plasmid-SafeTM ATP-Dependent DNase available from Epicentre Biotechnologies, Madison WI).
- Step 7G Linearization
- the circular nucleic acid products 2150 may then be fragmented to generate linear nucleic acid molecules comprising the end regions from the original target nucleic acid with an adaptor region in the middle using any of the various methods described elsewhere in this specification.
- Mmel or other Type Hs restriction sites as described elsewhere in this specification, however it will readily be appreciated that such sites could be included.
- the linear fragment may then be fragmented again using mechanical means, described in greater detail below and elsewhere in this specification, where the mechanical fragmentation selects for a particular fragment length that is substantially greater than the combination of the 20bp tag and 34bp loxP region.
- the result is a second tag in the pair of greater length than the first and a substantially reduced possibility of fragmentation within the intervening region comprising adaptor 21 10.
- the preferred lengths of the second tag in the pair may be based, at least in part, on the average or total read length capability of sequencing method employed to generate sequence data for the resulting paired end fragment.
- carrier DNA may also be added prior to the linearization step in order to prevent inadvertent loss of valuable target DNA fragments during subsequent purification steps which may be present in low quantities and/or of low quality.
- carrier DNA may also be added prior to the linearization step in order to prevent inadvertent loss of valuable target DNA fragments during subsequent purification steps which may be present in low quantities and/or of low quality.
- a type II restriction site such as Mmel
- carrier DNA may also be advantageous in the same or alternative embodiments to use other types of carrier DNA for other purposes which may be more suitable for the particular application.
- One such purpose includes analysis of the efficiency of mechanical manipulation steps such as the linearization step described above.
- carrier DNA products are indistinguishable from paired end template 2160 when pooled in a sample.
- the carrier DNA for analysis of the mechanical manipulation step but generally undesirable to consume valuable resources from the sequencing process to produce sequence information from the carrier DNA which is not of interest.
- One means in which to limit the sequencable quantities of the carrier DNA is to render it un- amplifiable by PCR or other amplification method.
- the pool of linearized products such as paired end template 2160
- the overall representation of the carrier DNA is substantially reduced in the amplified population of sequencable templates represented as population 2170.
- circular carrier DNA such as pUC 19 may be specifically treated with short wavelength ultraviolet light effective cross linking the strands by creating pyrimidine dimers and rendering it un-amplifiable so that it is not substantially represented in the final sample and sequenced.
- the treated carrier DNA may be added to the sample with the circularized target DNA (i.e. circular product 2150) and linearized so that the sample includes linearized representatives from both the target (i.e. paired end template 2160) and carrier DNA populations.
- the entire sample may be analyzed to determine the efficiency of the linearization, such as for instance by using a LabChip DNA 7500 chip available from Agilent Technologies, inc., where the carrier DNA enables a more accurate determination due to the increase in nucleic acid volume.
- the copy number of the carrier DNA will not increase resulting in an amplified sample having a substantially greater proportion of target DNA molecules.
- Step 7H - Enrichment Further, illustrated in Figure 22 is an embodiment of enrichment tag 2205 associated with each adaptor species, which may include a Biotin tag or other type of enrichment tag described elsewhere in this specification or generally known on the art.
- an enrichment tag such as a biotin moiety allows for the optional selection of adaptor-containing paired end fragments, and an optional immobilization of the paired end library fragments (after linearization of the circular nucleic acid) during the ligation of the paired end adaptors, during the fill- in reaction (fragment repair), and during the paired end library amplification.
- An additional advantage of loxP adaptors 2105 and 2107 described herein is that adaptor-adaptor ligation events will only lead to adaptor dimers, i.e. the formation of multimer adaptor concatemers is prevented.
- steps J-L of the sixth method steps 6J-6L
- steps 6J-6L steps 6J-6L
- Figure 25 illustrates the substantial advantage that the long paired end reads of about 20Kb provides over the shorter paired ends reads of about 3 Kb in the assembly of the E. coli Kl 2 genomic scaffold, and an even greater advantage over the well known shotgun based approaches.
- Method 7 provides other advantages over ligation based methods because it requires fewer processing steps requiring fewer valuable resources such as technician time, instrument time and usage, and reagent usage.
- the hairpin adaptors may be replaced with overhang adaptors (Figure 8).
- the overhang adaptor may be biotinylated and may, for example, have the sequence of: 5'OH-AATTC AAACCCTTTCGGT TCCAAC-3'OH (Seq
- the six 3' terminal nucleotides of the upper strand (Seq ID NO:28), i.e., TCCAAC, in co ⁇ juction with the complementary nucleotides of the lower strand (Seq ID NO:29), form a recognition site for the Type II S restriction enzyme Mmel.
- Self ligation i.e., circularization
- an exonuclease digest may subsequently be performed to remove unligated non-circular DNAs. Since DNA fragments not ligated to overhang adaptors have blunt ends due to polishing, they are not expected to ligate as efficiently as the 5' overhang ends (sticky ends) of the fragments with two overhang adaptors ligated one on each side.
- Mme I digest is used to remove DNA distal to the overhang adaptors (see Figure 8F) leaving about 20 bases of the original genomic DNA on each side of the ligated overhang adaptors ( Figure 8G).
- the fragment with overhang adaptors are purified using a streptavidin bead which binds to the biotinylated adaptors ( Figure 8 H).
- the resulting fragment may be sequenced by any method available such as, for example, the methods provided in this disclosure (e.g., step 3H).
- the nucleic acids generated by the methods of the invention may be sequenced using one or more primers complementary to the end(s) of the sequence. That is, under the sequencing protocol described in Step 3 H, a sequencing adaptor A and sequencing adaptor B is ligated to the ends of fragments before they are sequenced. Since the end sequence of the fragment is know to be either sequencing adaptor A or B, a sequencing primer complementary to sequencing adaptor A or B may be used to sequence the fragment. Furthermore, a sequence in the middle of each fragment, comprising ligated adaptors, is known (see, e.g., 703 in Figure 7). Sequencing may also start from the middle using a primer complementary to this middle region.
- a sequencing primer from the end region and a sequencing primer from the middle region may be hybridized to a fragment to be sequenced concurrently (see Figure 9).
- One primer is protected while the other primer is not.
- the primer hybridized to the end is protected by a phosphate group.
- the first round of sequencing will commence from the nonprotected primer ( Figure 9, middle primer).
- the elongation of the first primer may optionally be terminated, for example by incorporation of a complementary dideoxynucleotide.
- elongation of the first primer may have proceeded to the end of the template strand, making termination unnecessary.
- the fragmented starting DNA ( Figure 10A) is ligated to adaptors with 3' CC overhangs and an optional internal Type IIS restriction endonuclease site.
- the ligated fragments cannot self ligate or self circularize because their ends are not compatible (not complementary). However, these fragments may be ligated using a linker with 5' GG overhangs on both sides ( Figure 10 B).
- the nucleic acid fragments may be purified from non- circular DNA by standard gel and column chromatography discussed above or by exonuclease digestion which cleaves uncircularized molecules.
- the resulting circular DNA ( Figure 10 D) may be cleaved with Mmel as in the other methods and the resulting DNA may be sequenced.
- the methods of the invention may be used to produce A/B adapted ssDNA ( Figure 11, step 1).
- This single stranded fragment may be circularized by hybridization to an oligo comprising sequences complementary to the A/B adaptors ( Figure 11 , step 2) and ligated in the presence of ligase.
- the oligo may be used as a primer to facilitate rolling circle amplification of the circularized ssDNA ( Figure 11, step 3).
- the rolling-circle amplified DNA may be cleaved as described for Method 1 , Steps 1 K and L ( Figure 1 L and M), Following amplification, standard library preparation and sequencing techniques may be applied to the product ( Figure 11 , step 4).
- Some embodiments of the present invention are based upon the surprising discovery that in a paired end sequencing experiment of the E. coli strain Kl 2 genome, wherein the experimental protocol comprised the use of Mmel cleavage according to the methods described herein, the depth of read coverage across the genome varied greatly (Figure 20, "no carrier(-)"). By depth is meant the number of sequence reads mapping to substantially the same region of the genome. This depth variation was correlated to the density of Mmel sites across the genome ( Figure 20). Unexpectedly and surprisingly, the inventors discovered that the addition of double stranded DNA known to contain Mmel sites (designated "(+)" in Figure 20), i.e. E.
- Coli B Strain DNA (“EcoliBStrain(+)"), Salmon Sperm DNA (“SalSprmDNA(+)”), or a PCR amplification product known to contain Mmel sites ("AmpPosMmeI(+)”) greatly decreased and randomized the variation of depth of coverage across the genome.
- addition of double stranded DNA lacking Mmel sites (designated “(-)” in Figure 20), i.e. poly(dldC) ("dldC(-)”), or a PCR amplification product known to contain no Mmel sites (“AmpNegMmel(-)”) did not change the pattern of variation of depth of coverage across the genome, as compared to the "no carrier" control.
- Table 1 shows depth of coverage statistics for E.Coli Kl 2. The top three samples (rows) had Mmel-positive carrier DNA added, while the bottom three samples had Mmel-negative carrier DNA added.
- Table 1 shows, in accordance with Figure 20, that the variation in depth of coverage across the E. coli Kl 2 genome was greatly lowered by the addition of Mmel-positive carrier DNA (see Depth STDEV and Depth %CV values; smaller Depth STDEV and Depth %CV values are advantageous). This lead to a more uniform distribution of paired end reads across the genome. This uniform distribution is advantageous.
- Table 2 shows the effect of paired end sequencing data obtained with Mmel- positive carrier DNA on the scaffolding of shotgun contigs.
- a lower number (19-25) of scaffolds i.e., larger scaffolds
- Mmel-positive carrier DNA columnumns "Stratagene SS dsDNA (+)", “E.Coli Bstrain (+)” and “Amplified Positive (+)”
- the use of Mmel positive carrier DNA improves the genome assembly performance achieved by paired end sequencing performed according to the present invention.
- some embodiments of the invention include the use of double- stranded "carrier DNA".
- the carrier DNA is employed in a step that comprises DNA cleavage by the restriction endonuclease Mmel.
- the carrier DNA contains one or more Mmel sites. Endonucleolytic cleavage by Mmel occurs most efficiently when the number of moles of Mmel enzyme molecules about equals the number of moles of Mmel sites present in the DNA sample (Product Catalog of New England Biolabs, Ipswich, MA, USA).
- the number of Mmel sites can be difficult to estimate due to low DNA concentrations (typically in the order of nanograms to tens of nanograms) which are difficult and time consuming to measure reliably, and also due to variations in the number of Mmel sites based on the target DNA to be sequenced.
- an accurate computation of the amount of Mmel enzyme to be added to a reaction is problematic.
- some methods of the invention include the addition of an excess of carrier DNA (in relation to sample DNA).
- the amount of Mmel enzyme to be added to the reaction can be calculated based upon a known amount of carrier DNA, while the number of Mmel sites in the (circular) sample DNA becomes negligible. A measurement of the DNA concentration of the sample DNA therefore becomes unnecessary. This improves the speed and reduces cost and time required by the methods.
- the amount of carrier DNA may outweigh the amount of sample DNA by several fold to about tenfold, to about 100-fold, to about 1000-fold, or more.
- two micrograms of sonicated double stranded salmon sperm DNA is added to the sample DNA with 2 units of Mmel and all required reagents (e.g.
- reaction temperature and duration may be adjusted within practical ranges.
- carrier DNA may be employed for analysis of mechanical manipulation of the sample, where it is desirable that the carrier DNA not interfere with other steps in the process.
- One such process is the amplification of the DNA sample, where a circular carrier DNA may be treated (i.e. by creating DNA damage) using methods known to the artisan having ordinary skill rendering the DNA un-amplifiable but otherwise unaffected.
- pUC 19 vector DNA may be irradiated using short wave length ultraviolet light for about 45 minutes (i.e. typically between 30 and 60 minutes) which creates what are referred to as "pyrimidine dimers" in the DNA structure.
- Polymerase enzymes typically used for amplification processes are unable to "read through” the dimers on the template DNA, and thus the irradiated pUC DNA is un-amplifiable.
- damage may be generated by endogenous or exogenous processes.
- Some means of producing DNA damage include, but are not limited to, UV damage (UV-B, UV-A), alkylation/methylation, X-ray damage, hydrolysis (i.e. via thermal disruption causing depurination), and oxidative damage.
- the treated circular carrier DNA is added to the circularized target DNA sample to improve the characterization of the effectiveness of the linearization step, particularly linearization that employs mechanical fragmentation, such as by use of nebulization.
- linearization that employs mechanical fragmentation, such as by use of nebulization.
- between l-4ug of treated carrier pUC DNA may be added to a circularized target DNA sample and nebulized for 2 minutes at 30psi to produce linear nucleic acid fragments with members comprising a pair distance of about 20kb.
- the entire nebulized sample is tested using a LabChip 7500 test chip from Agilent Technologies to determine if the nebulization produced the desired results.
- Table 3 shows the relative percentage of carrier DNA present in a sample after amplification, which is proportional to the quantity of untreated carrier DNA added to the sample pre-amplifi cation.
- the addition of lug untreated carrier DNA results in a representation of the carrier DNA in 6% of the nucleic acid molecules in the amplified sample, and similarly the addition of 3ug results in a 20% representation.
- Table 4 shows the relative percentage of treated carrier DNA present in a sample after amplification, where there is a substantial reduction from the untreated carrier DNA represented in table 3.
- the addition of lug treated carrier DNA results in a representation of the carrier DNA in 0.02% of the nucleic acid molecules in the amplified sample, and similarly the addition of 3ug results in a 0.06% representation.
- Some embodiments of the present invention also include methods for circularization of nucleic acid molecules via ligation.
- circularization of nucleic acid molecules is achieved by ligation at low nucleic concentrations. Low concentrations favor the desired intramolecular ligation reaction (i.e. circularization) which follow first-order reaction kinetics, over intermolecular events which follow second-order (or higher-order) reaction kinetics (F. M. Ausubel, et al., (eds), 2001, Current Protocols in Molecular Biology, John Wiley & Sons Inc.).
- F. M. Ausubel, et al., (eds) can not be prevented, and extreme dilutions of the nucleic acid is not practical.
- intermolecular ligation reduces the yield of the desired intramolecular circularization events.
- intermolecular ligation products can be detrimental to downsteam applications, hi summary, the conventional approach has at least two major drawbacks. Firstly, the need to dilute the starting nucleic acid increases the reaction volume and associated reagent costs. The high dilution also makes efficient recovery of the reaction products difficult. Secondly, large numbers of intermolecular ligation events do occur, reducing the yield of the desired intramolecular ligation products.
- the invention includes methods which largely eliminate the issues associated with the conventional circularization approaches described above. For example, according to the present invention, there is no need to perform the ligation reaction at high dilution, i.e. at low nucleic acid concentrations.
- individual linear double-stranded DNA molecules having compatible ligatable ends such as blunt ends or staggered ("sticky") ends, are ligated in physically isolated reaction environments.
- An aqueous solution containing the DNA to be ligated and all reagents necessary for the ligation reaction (for example, DNA ligase, ligase buffer, ATP, etc.), is emulsified in oil, preferably in the presence of a surfactant that serves to stabilize the emulsion.
- the resulting water-in- oil emulsion contains microdroplets (microreactors), each containing zero, one, or more DNA molecules.
- the number of DNA molecules per microreactor can be adjusted by modifying the DNA concentration and the size of the microdroplets. For a skilled artisan, it is a matter of routine optimization to calculate appropriate conditions based on nucleic acid concentration, the size of the polynucleotides (length measured as the number of bases), and the average volume of the microdroplets.
- An ideal microdroplet will contain a single ligatable DNA molecule.
- the number of DNA molecules per microreactor will vary depending, in part, on size variability of the microreactors and random distribution of the DNA molecules.
- some microreactors may contain no DNA molecule, some may contain one DNA molecule, and some may contain two or more DNA molecules.
- yield and cost can be balanced as needed by varying the average number of DNA molecules per microreactor.
- the ligation mixture will be kept cold (for example, at 0 — 4 degrees Celsius) while it is being assembled and until the emulsification process is complete. This will prevent the ligation reaction from proceeding before the desired emulsion environment is formed, and will therefore prevent the formation of unwanted intermolecular bonds. Subsequently, the emulsified ligation reaction will be incubated at temperatures that are permissive of the ligation reaction. The incubation time may range from several minutes to an hour, to several hours, to overnight, or to 24 hours or more than a day.
- the ligation reaction may be halted to prevent undesirable intermolecular ligations in the combined ligation reactions.
- the ligation reaction may be halted by lowering the temperature to about 0-4 degrees Celsius (water ice), by heat inactivation of the ligase, by addition of EDTA, addition of a ligase inhibitor, etc. or any combination of such methods.
- RNA single stranded or double stranded DNA
- RNA single stranded or double stranded DNA
- the ends of a linear single stranded polynucleotide molecule can be brought in direct juxtaposition by annealing to a capping oligonucleotide (also termed a bridging oligonucleotide) that has portion complementary to each end of said linear single stranded polynucleotide molecule, as described in Step IK of Method 1 (see Figure IL and Figure 11).
- a capping oligonucleotide also termed a bridging oligonucleotide
- the emulsified ligation reaction may then be incubated at a suitable temperature.
- a suitable incubation temperature is 16 degrees Celsius, but a broad range of temperatures is acceptable.
- Conditions for ligation of DNA and other molecules are widely known in the art.
- One advantage of performing the circularization reaction in emulsion is that extended reaction times are neutral to, or even beneficial to the success of the procedure. For example, in an ideal scenario with no more than one DNA molecule per microreactor, the incubation time can be extended until most DNA molecules have been circularized, hi contrast, by using the conventional non- emulsion methods described above, prolonged incubation may lead to a higher proportion of intermolecular ligation products.
- Another advantage of the emulsion based ligation methods of the invention is the ability to allow the reaction to proceed for relatively long periods of time without increasing the occurrence of intermolecular ligation. Such increased incubation times allow for a greater number of circularized products without the increased risk of inter molecular ligations to occur. Furthermore, since the molecules are being isolated by physical means and not in a concentration dependent manner, the reaction volumes may be much lower (i.e. the nucleic acid concentration of nucleic acid in the aqueous phase may be much higher) for the same number of ligation events, lowering the cost for the reagents and increasing the ease of processing the samples. The skilled artisan will understand that for ligation to occur in a given microdroplet, said microdroplet must contain sufficient reagents, including at least one molecule of ligase enzyme.
- the ligation reaction may be halted, and the emulsion is "broken" (also referred to as "demulsif ⁇ cation” in the art).
- broken also referred to as "demulsif ⁇ cation” in the art.
- Demulsification may be followed by a nucleic acid isolation step that may be done by any suitable method for isolating nucleic acid.
- the unligated material may be removed by any method suitable for this task, one of which is to perform an exonuclease digestion of the sample.
- exonuclease enzyme used will depend, in part, on the type of molecules being worked on (single stranded or double stranded, DNA or RNA), and other considerations, for example reaction temperatures conveniently incorporated into the process.
- the circularized material will have to be purified after the exonuclease treatment by one of the many procedures known in the art, such as phenol/chloroform extraction or any commercially available purification kit suitable for this purpose.
- the emulsion ligation methods of the invention are particularly useful in the circularization of long polynucleotide molecules, such as molecules longer than about 500 bases, longer than about 1000 bases, longer than about 2000 bases, longer than about 5000 bases, longer than about 10000 bases, longer than about 20,000 bases, longer than about
- the emulsion ligation methods described herein are useful in a wide variety of ligation reactions, whether they result in circularization or not.
- the emulsion ligation methods described above may be used in any ligation step of the various methods described herein, especially ligation reactions where circularization of the input nucleic acids is desired.
- Emulsions are heterogeneous systems of two immiscible liquid phases with one of the phases dispersed in the other as droplets of microscopic or colloidal size.
- Emulsions of the invention must enable the formation of microcapsules
- Emulsions may be produced from any suitable combination of immiscible liquids.
- the emulsion of the present invention has a hydrophilic phase
- microreactors contain reagents necessary for nucleic acid ligation.
- a plurality of microreactors may contain exactly one polynucleotide molecule each.
- a thermostable water-in-oil emulsion will be desirable, for example if heat inactivation of the ligase will be performed after the reaction, or if ligation is performed at elevated temperatures using a thermostable ligase (e.g. Taq DNA Ligase).
- the emulsion may be formed according to any suitable method known in the art. One method of creating emulsion is described below but any method for making an emulsion may be used.
- the size of the microcapsules may be adjusted by varying the flow rate and speed of the components. For example, in dropwise addition, the size of the drops and the total time of delivery may be varied.
- the microdroplets may be created within a microfluidic device, for example as described by Link et al. (Angew. Chem. Int. Ed., 2006, 45, 2556-2560), hereby incorporated by reference in toto.
- the microreactors should be sufficiently large to encompass sufficient nucleic acid and other ligation reagents. However, at least some of the microreactors should be sufficiently small so that a portion of the microreactor population contains a single self-ligatable polynucleotide molecule.
- the emulsion is heat stable.
- the droplets formed range in size from about 100 nanometers to about 500 micrometers in diameter, more preferably from about 1 micrometer to about 100 micronmeters.
- cross-flow fluid mixing optionally in combination with an electric field, allows for control of the droplet formation, and uniformity of droplet size.
- the oil is a silicone oil.
- Emulsions of the invention may be stabilised by addition of one or more surface- active agents (emulsion stabilizers; surfactants).
- surfactants are also termed emulsifying agents and act at the water/oil interface to prevent (or at least delay) separation of the phases.
- Many oils and many emulsifiers can be used for the generation of water-in-oil emulsions; a recent compilation listed over 16,000 surfactants, many of which are used as emulsifying agents (Ash, M. and Ash, I. (1993) Handbook of industrial surfactants. Gower, Aldershot).
- Emulsion stabilizers used in the methods of the present invention include Atlox 4912, sorbitan monooleate (Span ⁇ O; ICI), polyoxyethylenesorbitan monooleate (Tween ⁇ O; ICI) and other recognized and commercially available suitable stabilizers, hi various embodiments, the surfactant is provided at a v/v concentration in the oil phase of the emulsion of 0.5 to 50%, preferably 10 to 45%, more preferably 30- 40%.
- chemically inert silicone-based surfactants such as silicone copolymers
- silicone copolymer used is polysiloxane-polycetyl-polyethylene glycol copolymer (Cetyl Dimethicone Copolyol) e.g. Abil®EM90 (Goldschmidt).
- the chemically inert silicone-based surfactant may be provided as the sole surfactant in the emulsion composition or may be provided as one of several surfactants. Thus, a mixture of different surfactants may be used.
- one surfactant used is Dow Corning® 749 Fluid (used at 1-50%, preferably 10 to 45%, more preferably 25-35% w/w).
- one surfactant used is Dow Corning® 5225C Formulation Aid (used at 1-50%, preferably 10 to 45%, more preferably 35-45% w/w).
- the oil/surfactant mixture consists of: 40% (w/w) Dow Corning®
- the methods of the invention provide a plurality of benefits and advantages over current methods.
- One advantage of the current method over the prior art is that cloning and propagation of the prepared fragments in a eukaryotic or prokaryotic host is not required. This is especially useful where the target sequence comprise multiple repeats that may rearrange during propagation as an episome in a host cell.
- Another advantage of the disclosed method is that it can facilitate genome assembly by providing not only contig sequences, but the end sequences and orientation of the end sequences of long contigs which may have a length of over 100 bp, over 300 bp, over 500 bp, over lkb, over 5 kb, over 10 kb, over 100 kb, over 1 Mb, over 10 Mb, or larger.
- This sequence information and orientation information may be used to facilitate genome assembly, and provide gap closure.
- paired end reads provides a second level of confidence in the assembly of a genome. For example, if paired end sequencing and regular contig sequencing are in agreement about a DNA sequence, then the level of confidence of that sequence is increased. Alternatively, if the two sequence data contradicts each other, then the confidence is reduced and more analysis and/or sequencing would be necessary to locate the source of inconsistency.
- the presence or absence of open reading frames in paired end reads also provides directions as to the location of open reading frames. For example, if both sequenced ends of a contig contain an open reading frame, there is a chance that the complete contig is an open reading frame. This can be confirmed by standard sequencing techniques. Alternatively, with the knowledge of the two ends, specific PCR primers may be constructed to amplify the two ends and the amplified region may be sequenced to determine the presence of open reading frames. The methods of the invention will also improve the understanding of genome organization and structure. Since paired end sequencing has the ability to span regions that are difficult to sequence because a genomic structure may be deduced even if these regions are not sequenced. The difficult to sequence regions may be, for example, repeat regions and regions of secondary structure. In this case, the number and location of these difficult regions can be mapped in a genome even if the sequences of these regions are not known.
- the methods of the invention also allow the haplotyping of a genome over an extended distance.
- specific primers may be made to amplify regions of a genome containing two SNP linked by a long distance.
- the two ends of this amplified region may be sequenced, using the methods of the invention, to determine the haplotypes without sequencing the nucleic acid between the two SNP.
- This method is especially useful where the two SNPs span a region that is uneconomical to sequence. These regions include long regions, regions with repeats, or regions of secondary structure.
- FIG. 7A shows nucleic acids ligated to sequencing primers A and B in a format ready for sequencing. Some of the nucleic acids are contaminating nucleic acids which do not contain two ends of a single contig region (701). Nucleic acid fragments containing both ends of a contig are denoted as 702. Since nucleic acid 702 is the sole species of nucleic acid that comprises biotin, this species may be purified using a streptavidin bead ( Figure 7B). This specie is ready for sequencing after purification. By using affinity purification, the fraction of sequences that yield useful information may be substantially increased.
- endonucleolytic cleavage by EndoV of any double-stranded DNA containing opposite strand inosines can produce single stranded overhangs (sticky ends), wherein the overhangs may have virtually any nucleotide sequence.
- the invention also includes polynucleotide designs and methods substantially similar to Figure 14, but without a hairpin.
- the methods and compositions of the invention as depicted in Figure 14, with or without hairpins, as described above will be useful in a large number of molecular biology and recombinant DNA techniques in which the introduction of unique endonuclease sites is desirable. Such techniques include, but are not limited to, the construction of DNA and cDNA libraries, various subcloning strategies, or any methodology that benefits from unique endonuclease sites in primers, adaptors, or linkers.
- the paired-end nucleic acid constructs produced by any of the methods described herein may be sequenced by any sequencing method known in the art. Standard sequencing methods such as Sanger sequencing or Maxam-Gilbert sequencing are widely known in the art. Sequencing may also be performed, for example, by using the automated sequencing method known as 454 SequencingTM developed by 454® Life Sciences Corporation (Branford, CT, USA) which is described, for example, in U.S. Patent Nos. US 7,323,305, and 7,244,567, and U.S. Patent Application Serial Nos. 10/767,894, filed January 28, 2004; and 10/767,899, filed January 28, 2004.
- 454 SequencingTM developed by 454® Life Sciences Corporation (Branford, CT, USA) which is described, for example, in U.S. Patent Nos. US 7,323,305, and 7,244,567, and U.S. Patent Application Serial Nos. 10/767,894, filed January 28, 2004; and 10/767,899, filed January 28, 2004.
- biotin avidin or streptavidin
- a binding pair may be any two molecules that show specific binding to each other and include, at least, binding pairs such as FLAG/anti-FLAG antibody; Biotin/avidin, biotin/streptavidin, receptor/ligand, antigen/antibody, receptor/ligand, polyHIS/nickel, protein A/antibody and derivatives thereof. Other binding pairs are known and are published in the literature.
- Oligonucleotides used in the experiments are designed and synthesized as follows.
- Capture element oligonucleotides shown on the top part of Figure 3A, are designed to include UA3 adaptors and keys. A Notl site is located between the adaptors. The complete construct (the capture element) may be created using nested oligos and PCR. The sequence of the final product is synthesized and cloned.
- Type IIS capture fragment oligonucleotides shown on the bottom part of Figure 3 A, are similar to the capture fragment described above except that sequences representing a type IIS restriction endonuclease site (e.g., Mmel) are included in the capture fragment after the key sequence.
- sequences representing a type IIS restriction endonuclease site e.g., Mmel
- type IIS restriction endonucl eases cleave DNA at various distances from the recognition site, in the case of Mmel, at 20/18 bases.
- a short adaptor capture fragment oligonucleotide was designed to contain SADl adaptors and keys ( Figure 3B). A Notl site is also situated between the adaptors. This oligonucleotide may be synthesized with a Mmel type IIS restriction endonuclease cleavage site after the key sequence (See figure 3B, short adaptor capture fragment (type HS)).
- E. CoIi Kl 2 DNA (20 ⁇ g) in 100 ⁇ l was hydrosheared on speed 10 for 20 cycles using the standard HydroShear assembly (Genomic Solutions, Ann Arbor, MI, USA).
- a methylation rection was performed on the sheared DNA by adding 50 ⁇ l of DNA (5 ⁇ g), 34.75 ⁇ l of H 2 O, 10 ⁇ l of methylase buffer, 0.25 ⁇ l of 32 mM SAM, and 5 ⁇ l of EcoRI methylase (40,000 units/ml, New England Biolabs (NEB), Ipswich, MA, USA). The reactions were incubated for 30 minutes at 37°C.
- methylated DNA was purified using a Qiagen MinElute PCR Purification column, according to the manufacturer's instructions. The purified DNA was eluted from the column with 10 ⁇ l of EB buffer.
- the sheared, methylated DNA was subjected to a polishing step to create sheared material having blunt ends.
- DNA at 10 ⁇ l was added to a reaction mixture containing 13 ⁇ l H 2 O, 5 ⁇ l of 1OX polishing buffer, 5 ⁇ l of 1 mg/ml bovine serum albumin, 5 ⁇ l of 10 mM ATP, 3 ⁇ l of 10 mM dNTPs, 5 ⁇ l of 10 U/ ⁇ l T4 polynucleotide kinase, and 5 ⁇ l of 3 U/ ⁇ l T4 DNA polymerase.
- the reactions were incubated for 15 minutes at 12°C, after which the temperature was raised to 25°C for an additional 15 minutes.
- the reactions were subsequently purified on a Qiagen MinElute PCR purification column according to the manufacturer's instructions.
- the hairpin adaptor was ligated to the sheared, blunt-end DNA fragments by adding 10 ⁇ l of 5 ⁇ g sheared DNA, 17.5 ⁇ l of H 2 O, 50 ⁇ l of 2X Quick Ligase Buffer, 20 ⁇ l of 10 ⁇ M Hairpin Adaptor, and 2.5 ⁇ l of Quick Ligase (T4 DNA Ligase, NEB).
- the reactions were incubated at 25°C for 15 minutes, after which the ligated fragments were selected by adding to the mixture 2 ⁇ l of ⁇ exonuclease, 1 ⁇ l Rec J (30,000 units/ml, NEB), 1 ⁇ l of T7 exonuclease (10,000 units/ml , NEB), and 1 ⁇ l of exonuclease I (20,000 units/ml, NEB).
- the reactions were incubated at 37°C for 30 minutes, after which the samples were purified on a Qiagen MinElute PCR Purification column.
- the treated DNA was then passed through an Invitrogen Purelink column according to the manufacturer's instructions and eluted from the column in a volume of 50 ⁇ l.
- the ligated, exonuclease-treated DNA was subjected to digestion by EcoRI. Reactions containing 50 ⁇ l of DNA, 30 ⁇ ! OfH 2 O, 10 ⁇ l of EcoRI buffer, and 10 ⁇ l of EcoRI (20,000 units/ml) were incubated at 37°C overnight. The cleaved products were purified using a Qiagen QiaQuick column according to the manufacturer's instructions. The cleaved products were ligated once more to generate closed circular DNA in reactions containing 50 ⁇ l of DNA, 20 ⁇ l of Buffer 4 (New England Biolabs), 2 ⁇ l of 100 mM ATP, 123 ⁇ l Of H 2 O, and 5 ⁇ l of ligase (as above).
- the ligation reactions were incubated at 25°C for 15 minutes, after which they were subjected to another round of exonuclease treatment by adding to the mixture 1 ⁇ l of ⁇ exonuclease (5,000 units/ml, NEB) , 0.5 ⁇ l of Rec J (as above), 0.5 ⁇ l T7 exonuclease (as above), and 0.5 ⁇ l exonuclease I (as above).
- the exonuclease reactions were incubated at 37 0 C for 30 minutes, after which the sample was purified with a Qiagen MinElute PCR Purification column.
- the treated DNA was then subjected to Mme I digestion in a reaction mixture containing 10 ⁇ l of DNA, 78.75 ⁇ l of H2O, 10 ⁇ l of Buffer 4 (New England Biolabs), 0.25 ⁇ l of SAM, and 0.5 ⁇ l of Mme I (2,000 units/ml, NEB).
- the reactions were digested with Mme I for 60 minutes at 37°C, then purified on a Qiagen QiaQuick column that was buffered with a final concentration of 0.1% of 3 M sodium acetate.
- the column was washed with 700 ⁇ l of 8.0M guanidine HCl and the sample was added to the column according to the manufacturer's instructions.
- the DNA was eluted in 30 ⁇ l of EB buffer, and diluted to a final volume of 100 ⁇ l.
- Streptavidin magnetic beads 50 ⁇ l (Dynal Dynabeads M270, Invitrogen, Carlsbad, CA, USA), were prepared by washing with 2X bead binding buffer and suspending the beads in 100 ⁇ l of 2X bead binding buffer, after which 100 ⁇ l of the DNA sample was added to the beads and mixed for 20 minutes at room temperature. The beads were washed twice in wash buffer.
- the SAD7 adaptor set (A/B set, wherein the single stranded oligonucleotides SAD7Ftop and SAD7Fbot are annealed to form the A adaptor, and the single stranded oligonucleotides SAD7Rtop and SADRFbot are annealed to form the B adaptor)
- SAD7Ftop 5'- CCGCCC AGC ATCGCCTC AGNN-3' (SEQ ID NO:51
- SAD7Fbot 5'- CTGAGGCGATGCTGG-3' (SEQ ID NO:52);
- SAD7Rtop 5'- CCGCCCGAGCACCGCTC AGNN-3' (SEQ ID NO:53);
- SAD7Rbot 5'- CTGAGCGGTGCTCGG-3' (SEQ ID NO:54), wherein N is any of the 4 bases A, G, T or C)
- a nucleotide fill-in reaction was performed by adding to the beads a mixture containing 40 ⁇ l H2O, 5 ⁇ l of 1OX Fill-in buffer, 2 ⁇ l of 10 mM dNTPs, and 3 ⁇ l Fill-in polymerase (Bst DNA polymerase, 8,000 units/ml, NEB). The reaction was incubated at 37°C for 20 minutes, and the beads washed twice in wash buffer. The beads were then suspended in 25 ⁇ l of TE buffer.
- the DNA bound to beads were then subjected to PCR in reaction mixtures containing 30 ⁇ l of H2O, 5 ⁇ l 1OX Advantage 2 Buffer, 2 ⁇ l 1OmM dNTPs, 1 ⁇ l of 100 ⁇ M forward primer (SAD7FPCR : 5'-BiO-CCGCCCAGCATCGCC-S' (SEQ ID NO:55)), 1 ⁇ l of 100 ⁇ M reverse primer (SAD7RPCR : 5'- CCGCCCGAGCACCGC-3' (SEQ ID NO:56), 10 ⁇ l of DNA bound to beads, and 1 ⁇ l of Advantage 2 polymerase mix (Clontech, Mountain View, CA, USA).
- SAD7FPCR 5'-BiO-CCGCCCAGCATCGCC-S' (SEQ ID NO:55)
- SAD7RPCR 5'- CCGCCCGAGCACCGC-3' (SEQ ID NO:56)
- 10 ⁇ l of DNA bound to beads and 1 ⁇ l of Advantage 2 poly
- PCR was carried out using the following program: (a) 4 minutes at 94 0 C, (b) 15 seconds at 94°C, (c) 15 seconds at 64°C, wherein steps (b) and (c) are carried out for 19 cycles, (d) 2 minutes at 68°C, after which the reactions were held at 14°C.
- the PCR products were purified using a Qiagen MinElute PCR Purification column, and then the purified products were run on a 1.5% agarose gel at 5 volts per centimeter to detect the presence of a 120 bp product. The 120 bp fragment was excised from the gel and recovered using a Qiagen MinElute gel extraction protocol.
- the 120 bp fragment was eluted in 18 ⁇ l of EB buffer.
- the double- stranded products were bound to streptavidin beads and washed twice with bead wash buffer.
- the single stranded products were eluted in 125 mM NaOH, and purified on a Qiagen MinElute PCR purification column. This material was then sequenced using standard 454 Life Sciences Corporation (Branford, CT, USA) sequencing methods on 454 Life Sciences Corporation automated sequencing systems.
- the purified sheared DNA was subjected to blunt-end polishing in a reaction mixture containing 23 ⁇ l of DNA, 5 ⁇ l of 1OX polishing buffer, 5 ⁇ l of 1 mg/ml bovine serum albumin, 5 ⁇ l of 10 mM ATP, 3 ⁇ l of 10 mM dNTPs, 5 ⁇ l of 10 U/ ⁇ l T4 polynucleotide kinase, and 5 ⁇ l of 3 U/ ⁇ l T4 DNA polymerase.
- the reactions were incubated for 15 minutes at 12°C, after which the temperature was raised to 25 0 C for another 15 minutes.
- the reactions were subsequently purified on a Qiagen MinElute PCR Purification column according to the manufacturer's instructions.
- Ligation of the non-hairpin adaptor was carried out using 2 ⁇ g of the sheared, purified DNA in a reaction mixture containing 25 ⁇ l of 2X Quick Ligase buffer, 18.5 ⁇ l of 10 ⁇ M of the non-hairpin adaptor, and 2.5 ⁇ l of Quick Ligase (as above).
- the ligation reaction was incubated at 25°C for 15 minutes, after which the sample was passed through a Sephacryl S-400 spin column, followed by a Qiagen MinElute PCR Purification column. The DNA was then eluted from the column with 10 ⁇ l of EB buffer.
- the purified, ligated DNA was then subjected to a kinase reaction, wherein the mixture contained 13 ⁇ l of H2O, 25 ⁇ l of 2X buffer, 10 ⁇ l of DNA, and 2 ⁇ l of 10 U/ ⁇ l T4 polynucleotide kinase.
- the reactions were incubated at 37°C for 60 minutes, after which the samples were run on a 1% agarose gel at 5 volts per cm. Bands between 1500 and 4000 bp were excised from the gel and recovered using a Qiagen MinElute gel extraction protocol.
- the purified DNA was subjected to another round of ligation to generate circular DNA in reaction mixtures containing 18 ⁇ l DNA, 20 ⁇ l of Buffer 4 (New England Biolabs), 2 ⁇ l of ATP, 150 ⁇ l of H2O, and 10 ⁇ l of ligase (as above).
- the reactions were incubated for 15 minutes at 25°C, after which a mixture containing 2 ⁇ l ⁇ exonuclease (as above), 1 ⁇ l Rec J (as above), 1 ⁇ l of T7 exonuclease (as above) and 1 ⁇ l of exonuclease I (as above), and incubated for 30 minutes at 37°C.
- the DNA was purified on a Qiagen MinElute PCR Purification column and eluted with 20 ⁇ l of EB buffer.
- the purified ligated DNA was then added to a mixture containing 68.6 ⁇ l H2O, 10 ⁇ l of Buffer 4 (New England Biolabs), 0.2 ⁇ l of SAM, and 1 ⁇ l of Mme I restriction endonuclease (as above).
- the DNA was cleaved at 37°C for 30 minutes, after which the DNA was purified using a Qiagen QiaQuick column that was pre-buffered at a final concentration of 0.1% of 3M sodium acetate and washed with 700 ⁇ l of 8.0M guanidine HCl.
- the purified DNA was then eluted with 30 ⁇ l of EB buffer and the volume adjusted to 100 ⁇ l.
- Streptavidin magnetic beads 50 ⁇ l (as above) were washed with 2X bead binding buffer and suspended in 100 ⁇ l of bead binding buffer. The beads were then mixed with 100 ⁇ l of the DNA sample and allowed to bind to each other for 20 minutes at room temperature. Thereafter, the beads were washed twice in wash buffer and subjected to a ligation reaction with the SAD7 adaptor set (A/B set) (as above).
- a mixture containing 15 ⁇ l H2O, 25 ⁇ l of Quick Ligase buffer, 5 ⁇ l of SAD7 adaptor, and 5 ⁇ l Quick Ligase (as above) were added to the DNA bound to beads, and incubated for 15 minutes at 25°C, after which the beads were washed twice in wash buffer.
- the DNA bound to beads were subjected to a fill-in reaction in a mixture containing 40 ⁇ l of H2O, 5 ⁇ l of 1OX Fill-in buffer, 2 ⁇ l of 10 mM dNTPs, and 3 ⁇ l of Fill-in polymerase (as above). The reaction took place for 20 minutes at 37°C, after which the beads were washed twice in wash buffer and suspended in 25 ⁇ l of TE buffer.
- the DNA bound to beads was amplified in a reaction mixture containing 30 ⁇ l H2O, 5 ⁇ l of 1OX Advantage 2 buffer, 2 ⁇ l of dNTPs, 0.5 ⁇ l of 100 ⁇ M forward primer (as above), 0.5 ⁇ l of 100 ⁇ M reverse primer (as above), 10 ⁇ l of DNA bound to beads, and 1 ⁇ l of Advantage 2 enzyme (as above).
- the PCR reaction took place under the following conditions: (a) 4 minutes at 94°C, (b) 15 seconds at 94°C, (c) 15 seconds at 64 0 C, wherein steps (b) and (c) were repeated for 24 cycles, (d) 2 minutes at 68°C, after which the PCR reaction was held at 14°C.
- PCR products were purified on a Qiagen MinElute PCR Purification column and run on a 1.5% agarose gel at 5 volts per cm. A product of 120 bp was excised from the gel and recovered with the Qiagen MinElute gel extraction protocol. The DNA was subsequently eluted in 18 ⁇ l of EB buffer.
- the double-stranded DNA was bound to streptavidin beads and the beads were washed twice with wash buffer.
- the single-stranded DNA was then eluted with 125 mM NaOH and subsequently purified using a Qiagen MinElute PCR purification column.
- the purified material was subjected to a standard 454 emulsion and sequencing protocol.
- E.coli contigs were produced from normal 454 sequences from four 60x60 runs (approximately 1.3 million reads): 303 contigs of greater than 1000 bp were produced, which had an average size of 16,858 bp and a maximum size of 94,060 bp. Table 5 contains additional results achieved using the above procedure.
- the ends of the DNA fragments were polished using T4 DNA Polymerase and T4 PNK as following in a microcentrifuge tube. Two reactions were performed for 30 ⁇ g initial DNA sample.
- T4 PNK (10U/ ⁇ l) 5 ⁇ l The reaction mixture was mixed well and incubated at 12°C for 15 minutes.
- loxP6 adaptors were added to the polished DNA fragments as follows (duplicated reactions were required) Roche 2X Rapid Ligase Buffer (#1) 50 ⁇ l loxP6 Adaptors (20 uM each) 1 O ⁇ l
- Two loxP ligated DNA samples were loaded onto a large 0.5% agarose gel using a preparative comb (may use multiple wells if using a sample comb), and the gel was run overnight at 35V.
- the DNA fragments in the desired range e.g. 20-25Kb were collected the next morning, and purified using QIAEX II as manufacturer's instruction.
- the site specific recombination to generate circularized molecules was performed using 150 - 300 ng DNA generated from fill-in reaction above.
- the reaction mixture was mixed well and incubated at 37°C for 45 minutes, then at 80°C for 10 minutes to inactivate the Cre recombinase.
- the reaction mixture was cooled to 10°C and the next step was performed immediately.
- the linear molecules were removed from the above reaction mixture by exonuclease treatment.
- the exonuclease incubation was immediately performed by adding the following regents into the chilled excision reaction mixture described above.
- reaction mixture was mixed well and incubated at 37°C for 30 - 60 minutes. Then the exonucleases were immediately inactivated by incubation at 80°C for 20 minutes.
- the rest procedure below is a modified version of 454 library preparation method.
- the circularized molecules were broken into less than 1 Kb fragments by nebulization. l ⁇ l of 0.5M EDTA and l ⁇ g pUC19 were added into the heat inactivated reaction mixture above. The DNA was nebulized in Nebulization Buffer for 2 minutes at 44 psi. The nebulized DNA fragments were cleaned up using a MinElute kit as manufacturer's instruction. 9. Fragment End Polishing
- the reaction mixture was mixed well and incubated at 12°C for 15 minutes. Immediately thereafter the reaction mixture was incubated at 25°C for 15 minutes. The reaction was cleaned up using QiaQuick and eluted in 50 ⁇ l EB.
- the polished DNA fragments were bound to streptavidin coated beads, e.g. Dynal M270 beads as manufacturer's recommendation.
- the beads were washed three times with 500 ⁇ l TE, and only the beads were kept.
- the reaction mixture was vortexed to mix and then
- the reaction mixture was mixed well and incubated at room temperature on a rotator for 15 minutes.
- the beads were washed at least 3 times with 500 ⁇ l TE, and only the beads were kept.
- a fill-in reaction was performed for nick repair and to fill-in the 5' overhang introduced by 454 PE adaptors.
- the double stranded paired end library was preamplified as follows:
- thermocycler Using the following program for the thermocycler:
- the desired library fragment size was selected by performing two rounds of SPRI beads cleaning as follows:
- DNA was eluted with 80 ⁇ l EB.
- Figure 24 includes a graph that illustrates the pair distance distribution that is consistent with the target insert size of 24Kb and a longest detected pair distance of approximately 40Kb.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physics & Mathematics (AREA)
- Microbiology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Analytical Chemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Immunology (AREA)
- Plant Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP09707782A EP2242855A1 (fr) | 2008-02-05 | 2009-02-04 | Séquençage d'extrémités par paire |
| CA2712426A CA2712426A1 (fr) | 2008-02-05 | 2009-02-04 | Sequencage d'extremites par paire |
| JP2010545396A JP2011510669A (ja) | 2008-02-05 | 2009-02-04 | ペアエンド配列決定の方法 |
| CN2009801131835A CN102027130A (zh) | 2008-02-05 | 2009-02-04 | 配对末端测序法 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US2631908P | 2008-02-05 | 2008-02-05 | |
| US61/026,319 | 2008-02-05 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2009098037A1 true WO2009098037A1 (fr) | 2009-08-13 |
Family
ID=40577731
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2009/000741 Ceased WO2009098037A1 (fr) | 2008-02-05 | 2009-02-04 | Séquençage d'extrémités par paire |
Country Status (5)
| Country | Link |
|---|---|
| EP (1) | EP2242855A1 (fr) |
| JP (1) | JP2011510669A (fr) |
| CN (1) | CN102027130A (fr) |
| CA (1) | CA2712426A1 (fr) |
| WO (1) | WO2009098037A1 (fr) |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2012029577A1 (fr) * | 2010-09-02 | 2012-03-08 | 学校法人 久留米大学 | Procédé de production d'adn circulaire à partir d'adn monomoléculaire |
| US9416358B2 (en) | 2011-08-31 | 2016-08-16 | Kurume University | Method for exclusive selection of circularized DNA from monomolecular DNA in circularizing DNA molecules |
| EP3378975A4 (fr) * | 2015-11-17 | 2018-12-26 | Nanjing Annoroad Gene Technology Co. Ltd | Procédé pour construire une bibliothèque d'adn pour le séquençage |
| WO2019032762A1 (fr) * | 2017-08-10 | 2019-02-14 | Rootpath Genomics, Inc. | Procédés pour améliorer le séquençage de polynucléotides à l'aide de codes-barres en utilisant une circularisation et une troncature de matrice |
| CN109790577A (zh) * | 2016-08-01 | 2019-05-21 | 豪夫迈·罗氏有限公司 | 从核酸测序制备物除去衔接子二聚体的方法 |
| US10858695B2 (en) | 2010-12-17 | 2020-12-08 | Life Technologies Corporation | Nucleic acid amplification |
| US10913976B2 (en) | 2010-12-17 | 2021-02-09 | Life Technologies Corporation | Methods, compositions, systems, apparatuses and kits for nucleic acid amplification |
| US11001815B2 (en) | 2010-12-17 | 2021-05-11 | Life Technologies Corporation | Nucleic acid amplification |
| EP3913053A1 (fr) * | 2017-04-23 | 2021-11-24 | Illumina Cambridge Limited | Compositions et procédés permettant d'améliorer l'identification d'échantillons dans des bibliothèques d'acides nucléiques indexés |
| US12247254B2 (en) | 2017-04-23 | 2025-03-11 | Illumina, Inc. | Compositions and methods for improving sample identification in indexed nucleic acid libraries |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6093498B2 (ja) * | 2011-12-13 | 2017-03-08 | 株式会社日立ハイテクノロジーズ | 核酸増幅方法 |
| EP2888371B1 (fr) * | 2012-08-24 | 2017-08-02 | Life Technologies Corporation | Procédés, compositions, systèmes, appareils et nécessaires utilisables en vue du séquençage d'extrémités appariées d'acides nucléiques |
| JP6422193B2 (ja) * | 2013-09-30 | 2018-11-14 | キアゲン ゲーエムベーハー | Dnaライブラリーの調製のためのdnaアダプター分子およびその生成法および使用 |
| EP3105349B1 (fr) * | 2014-02-11 | 2020-07-15 | F. Hoffmann-La Roche AG | Séquençage ciblé et filtrage uid |
| GB201416106D0 (en) * | 2014-09-11 | 2014-10-29 | Illumina Cambridge Ltd | Paired-end sequencing by attachment of the complementary strand |
| WO2016161236A1 (fr) * | 2015-04-02 | 2016-10-06 | The Jackson Laboratory | Procédé pour détecter des variations génomiques à l'aide de banques de paires d'appariement circularisées et d'un séquençage aléatoire |
| JP6931540B2 (ja) * | 2017-02-27 | 2021-09-08 | シスメックス株式会社 | 検体処理チップを用いた送液方法、検体処理チップの送液装置 |
| JP7333634B2 (ja) * | 2017-11-03 | 2023-08-25 | ユニヴァーシティ オブ ワシントン | コード化粒子を使用したデジタル核酸増幅 |
| CN112805380B (zh) * | 2018-09-21 | 2024-08-16 | 豪夫迈·罗氏有限公司 | 制备用于测序的模块化和组合核酸样品的系统和方法 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060166332A1 (en) * | 2000-02-25 | 2006-07-27 | Stratagene California | Method for ligating nucleic acids and molecular cloning |
| WO2007145612A1 (fr) * | 2005-06-06 | 2007-12-21 | 454 Life Sciences Corporation | Séquençage d'extrémités appariées |
-
2009
- 2009-02-04 WO PCT/EP2009/000741 patent/WO2009098037A1/fr not_active Ceased
- 2009-02-04 JP JP2010545396A patent/JP2011510669A/ja not_active Withdrawn
- 2009-02-04 EP EP09707782A patent/EP2242855A1/fr not_active Ceased
- 2009-02-04 CN CN2009801131835A patent/CN102027130A/zh active Pending
- 2009-02-04 CA CA2712426A patent/CA2712426A1/fr not_active Abandoned
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060166332A1 (en) * | 2000-02-25 | 2006-07-27 | Stratagene California | Method for ligating nucleic acids and molecular cloning |
| WO2007145612A1 (fr) * | 2005-06-06 | 2007-12-21 | 454 Life Sciences Corporation | Séquençage d'extrémités appariées |
Non-Patent Citations (3)
| Title |
|---|
| NAGY A: "CRE RECOMBINASE: THE UNIVERSAL REAGENT FOR GENOME TAILORING", GENESIS: THE JOURNAL OF GENETICS AND DEVELOPMENT, WILEY-LISS, NEW YORK, NY, US, vol. 26, no. 2, 1 February 2000 (2000-02-01), pages 99 - 109, XP008028883, ISSN: 1526-954X * |
| POSFAI GYORGY ET AL: "In vivo excision and amplification of large segments of the Escherichia coli genome", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 22, no. 12, 1 January 1994 (1994-01-01), pages 2392 - 2398, XP002300089, ISSN: 0305-1048 * |
| See also references of EP2242855A1 * |
Cited By (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103119162B (zh) * | 2010-09-02 | 2016-09-07 | 学校法人久留米大学 | 用于产生由单分子dna形成的环状dna的方法 |
| CN103119162A (zh) * | 2010-09-02 | 2013-05-22 | 学校法人久留米大学 | 用于产生由单分子dna形成的环状dna的方法 |
| KR20130101508A (ko) * | 2010-09-02 | 2013-09-13 | 구루메 다이가쿠 | 단분자 dna로 형성되는 환상 dna의 작성 방법 |
| WO2012029577A1 (fr) * | 2010-09-02 | 2012-03-08 | 学校法人 久留米大学 | Procédé de production d'adn circulaire à partir d'adn monomoléculaire |
| JP5780527B2 (ja) * | 2010-09-02 | 2015-09-16 | 学校法人 久留米大学 | 単分子dnaから形成される環状dnaの作成方法 |
| KR101583589B1 (ko) | 2010-09-02 | 2016-01-08 | 구루메 다이가쿠 | 단분자 dna로 형성되는 환상 dna의 작성 방법 |
| US8962245B2 (en) | 2010-09-02 | 2015-02-24 | Kurume University | Method for producing circular DNA formed from single-molecule DNA |
| US9434941B2 (en) | 2010-09-02 | 2016-09-06 | Kurume University | Method for producing circular DNA formed from single-molecule DNA |
| US12351865B2 (en) | 2010-12-17 | 2025-07-08 | Life Technologies Corporation | Methods, compositions, systems, apparatuses and kits for nucleic acid amplification |
| US11725195B2 (en) | 2010-12-17 | 2023-08-15 | Life Technologies Corporation | Nucleic acid amplification |
| US10858695B2 (en) | 2010-12-17 | 2020-12-08 | Life Technologies Corporation | Nucleic acid amplification |
| US10913976B2 (en) | 2010-12-17 | 2021-02-09 | Life Technologies Corporation | Methods, compositions, systems, apparatuses and kits for nucleic acid amplification |
| US11001815B2 (en) | 2010-12-17 | 2021-05-11 | Life Technologies Corporation | Nucleic acid amplification |
| US11578360B2 (en) | 2010-12-17 | 2023-02-14 | Life Technologies Corporation | Methods, compositions, systems, apparatuses and kits for nucleic acid amplification |
| US9416358B2 (en) | 2011-08-31 | 2016-08-16 | Kurume University | Method for exclusive selection of circularized DNA from monomolecular DNA in circularizing DNA molecules |
| EP3378975A4 (fr) * | 2015-11-17 | 2018-12-26 | Nanjing Annoroad Gene Technology Co. Ltd | Procédé pour construire une bibliothèque d'adn pour le séquençage |
| CN109790577A (zh) * | 2016-08-01 | 2019-05-21 | 豪夫迈·罗氏有限公司 | 从核酸测序制备物除去衔接子二聚体的方法 |
| CN109790577B (zh) * | 2016-08-01 | 2024-02-09 | 豪夫迈·罗氏有限公司 | 从核酸测序制备物除去衔接子二聚体的方法 |
| US11519026B2 (en) | 2016-08-01 | 2022-12-06 | Roche Sequencing Solutions, Inc. | Methods for removal of adaptor dimers from nucleic acid sequencing preparations |
| EP3913053A1 (fr) * | 2017-04-23 | 2021-11-24 | Illumina Cambridge Limited | Compositions et procédés permettant d'améliorer l'identification d'échantillons dans des bibliothèques d'acides nucléiques indexés |
| US11459610B2 (en) | 2017-04-23 | 2022-10-04 | Illumina Cambridge Limited | Compositions and methods for improving sample identification in indexed nucleic acid libraries |
| US12247254B2 (en) | 2017-04-23 | 2025-03-11 | Illumina, Inc. | Compositions and methods for improving sample identification in indexed nucleic acid libraries |
| WO2019032762A1 (fr) * | 2017-08-10 | 2019-02-14 | Rootpath Genomics, Inc. | Procédés pour améliorer le séquençage de polynucléotides à l'aide de codes-barres en utilisant une circularisation et une troncature de matrice |
Also Published As
| Publication number | Publication date |
|---|---|
| CA2712426A1 (fr) | 2009-08-13 |
| JP2011510669A (ja) | 2011-04-07 |
| CN102027130A (zh) | 2011-04-20 |
| EP2242855A1 (fr) | 2010-10-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7601499B2 (en) | Paired end sequencing | |
| US20090233291A1 (en) | Paired end sequencing | |
| WO2009098037A1 (fr) | Séquençage d'extrémités par paire | |
| AU2022204365B2 (en) | Preserving genomic connectivity information in fragmented genomic DNA samples | |
| EP2235217B1 (fr) | Procédé de fabrication d'une banque de marqueurs appariés pour le séquençage d'acides nucléiques | |
| EP1682680B1 (fr) | Procedes de production d'une etiquette appariee a partir d'une sequence d'acides nucleiques et methodes d'utilisation associees | |
| JP2012223203A (ja) | 両末端配列決定(pairedendsequencing) | |
| WO2012044847A1 (fr) | Adaptateurs d'acides nucléiques et leurs utilisations | |
| CN102165073A (zh) | 用于核酸作图和鉴定核酸中的精细结构变化的方法 | |
| WO2016135300A1 (fr) | Procédés d'amélioration d'efficacité de génération d'une bibliothèque de gènes | |
| AU2021329302A1 (en) | Sequence-specific targeted transposition and selection and sorting of nucleic acids | |
| HK1128725A (en) | Paired end sequencing | |
| WO2025024703A1 (fr) | Dnaseq unicellulaire à double tagmentation | |
| KR20230154078A (ko) | CAS-gRNA 리보핵단백질을 사용한 게놈 라이브러리 제작 및 표적화된 후성적 검정 | |
| HK40068506B (en) | Preserving genomic connectivity information in fragmented genomic dna samples | |
| CN105602937A (zh) | 用于核酸作图和鉴定核酸中的精细结构变化的方法 | |
| WO2017061861A1 (fr) | Amplification de locus ciblée à l'aide de stratégies de clonage |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 200980113183.5 Country of ref document: CN |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09707782 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2712426 Country of ref document: CA |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2010545396 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2009707782 Country of ref document: EP |