HK1260391A1 - Polynucleotide mapping and sequencing - Google Patents
Polynucleotide mapping and sequencing Download PDFInfo
- Publication number
- HK1260391A1 HK1260391A1 HK19120123.5A HK19120123A HK1260391A1 HK 1260391 A1 HK1260391 A1 HK 1260391A1 HK 19120123 A HK19120123 A HK 19120123A HK 1260391 A1 HK1260391 A1 HK 1260391A1
- Authority
- HK
- Hong Kong
- Prior art keywords
- nucleotide
- biopolymer
- sample
- sequence
- nicking
- Prior art date
Links
Description
The present application is a divisional application entitled "polynucleotide mapping and sequencing" on application date 2009, 11/18, application No. 201410584764.
RELATED APPLICATIONS
Priority of U.S. application No.61/115, 704 filed on 18/11/2008, this application is hereby incorporated by reference in its entirety for any and all purposes.
Technical Field
The disclosed invention relates to the fields of nucleic acid sequencing and molecular imaging. The disclosed invention also relates to the field of nanotechnology.
Background
With the progress of molecular biotechnology, there is more interest in analyzing smaller and smaller samples with ever increasing resolution and accuracy. Some of these are driven by the recognition that population heterogeneity can often mask important features of a sample. The limited sample volume is also a consideration for some applications.
While existing techniques are theoretically capable of extracting important information from small body samples (physical small samples), the effectiveness of such techniques is limited by their ability to resolve structural features on such small samples. Accordingly, there is a need in the art for methods and related devices that can obtain genomic information based on single molecules or other small body samples. The value of such methods would be increased if they could be improved to accuracies in excess of 1000bp (1kb) achieved by current techniques.
Disclosure of Invention
To address the challenges, the claimed invention first provides a method for analyzing the presence or relative position of one or more exons, the method comprising labeling first and second locations on a biopolymer using first and second labels, respectively, such that the first and second labels flank a first region of the biopolymer comprising at least one constant exon (constantexon); and linearizing the biopolymer and correlating the distance between the first and second labels to the presence, absence, or relative position of an alternative exon (alternative exon) in the first region of biopolymer.
In a second aspect, the present invention provides a method of obtaining structural information about a DNA sample, comprising nicking a first double-stranded DNA sample using a sequence-specific nicking endonuclease (nicking endonuclease); incorporating one or more dye-labeled nucleotides at two or more nicking sites generated by the nicking endonuclease; linearizing a portion of the first double-stranded DNA sample comprising at least two dye-labeled nucleotides; and recording the relative positions of two or more labeled dye-labeled nucleotides.
Also provided is a method of obtaining sequence information about a nucleic acid biopolymer, comprising binding a first fluorescently labeled sequence specific probe having a first binding sequence to a single-stranded nucleic acid biopolymer; contacting the single-stranded nucleic acid biopolymer with a first terminator nucleotide (terminator nucleotide) carrying a first fluorescent label, with a second terminator nucleotide carrying a second fluorescent label, with a third terminator nucleotide carrying a third fluorescent label, and with a fourth terminator nucleotide carrying a fourth fluorescent label; and linearizing and irradiating the nucleic acid biopolymer to determine the presence or relative position of the first terminator nucleotide, the second terminator nucleotide, the third terminator nucleotide, the fourth terminator nucleotide, or any combination thereof adjacent to the first labeled sequence-specific probe.
The present invention also provides a method of obtaining structural information about a nucleic acid biopolymer, comprising contacting a double-stranded biopolymer with a nicking endonuclease to generate a first nicking site; contacting the first nick site with a first terminator nucleotide carrying fluorescent label a, with a second terminator nucleotide carrying fluorescent label B, with a third terminator nucleotide carrying fluorescent label C, and with a fourth terminator nucleotide carrying fluorescent label D; and linearizing and irradiating the double-stranded biopolymer to determine the relative position of the first terminator nucleotide, the second terminator nucleotide, the third terminator nucleotide, the fourth terminator nucleotide, or any combination thereof.
Further provided is a kit for performing multiplex hybridization (multiplex hybridization) comprising a plurality of hybridization probes each having a different color; and instructions for applying at least two of the plurality of hybridization probes to the nucleic acid sample and linearizing and imaging at least one hybridized nucleic acid.
Drawings
The summary of the invention, as well as the following detailed description, is further understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings exemplary embodiments of the invention; however, the invention is not limited to the specific methods, compositions, and devices disclosed. Additionally, the drawings are not necessarily drawn to scale. In the drawings:
figure 1A illustrates mapping statistics (mappingstatistics) for nt.bstnbi nicking endonuclease showing that optical resolution of 100bp significantly improves map accuracy and coverage;
figure 1B illustrates unique mapping statistics for nt.bspqi nicking endonuclease showing that optical resolution of 100bp has little effect on map accuracy and coverage;
FIG. 1C illustrates that the relatively fine map (1.5kb) has better detectability of structural variation than the relatively coarse map (16 kb);
FIG. 2A depicts the gene structure of MAPT;
figure 2B lists the size of each exon (variable exons are shown shaded) for each exon present in the MAPT gene.
FIG. 2C illustrates a bar code or mapping scheme applied to super resolution imaging at RNA exon splicing;
FIG. 2D illustrates a multiple barcode scheme;
FIG. 3 illustrates starting materials for sequencing;
FIG. 4 depicts the first cycle of a sequencing reaction;
FIG. 5 depicts the second sequencing cycle starting at FIG. 4;
FIG. 6 shows that the multiplex sequencing protocol significantly increases yield;
FIG. 7A depicts a 741bp PCR product model system for demonstrating SHRIMP resolution;
FIG. 7B graphically illustrates the results of imaging labeled DNA molecules after they were linearized on a glass surface, showing that the three (3) Cy3 dye molecules are 30nm and 60nm apart, which is in good agreement with the 94bp and 172bp distances between the three (3) Cy3 probes.
FIG. 8A depicts a 741bp PCR product model system for demonstrating SHRIMP and SHREC resolution;
FIG. 8B illustrates the results of imaging labeled DNA molecules after they were linearized on a glass surface-the distance between the Cy3-Cy5 pairs was 37. + -.5 nm (expected to be 32nm) and 91. + -.5 nm (expected to be 87nm) and the distance between the Cy3-Cy3 pairs was 56. + -.3 nm (expected to be 58nm) (FIG. 4), showing excellent consistency;
FIG. 9 depicts an exemplary, non-limiting embodiment of the claimed method of ascertaining structural information about genetic material;
FIG. 10 depicts a second exemplary, non-limiting embodiment of the claimed method of ascertaining structural information about genetic material;
FIG. 11 depicts one non-limiting embodiment of the claimed process; and
FIG. 12 depicts yet another non-limiting embodiment of the claimed method.
Figure 13 depicts (plots) the step of piecing together the female parent and the valid barcode for the female parent.
Detailed Description
The present invention may be understood more readily by reference to the following detailed description taken in conjunction with the accompanying drawings and the examples, which form a part of this disclosure. It is to be understood that this invention is not limited to the specific devices, methods, applications, conditions, or parameters described and/or shown herein, and that the terminology used herein is for the purpose of describing particular embodiments by way of example only and is not intended to be limiting of the claimed invention. Furthermore, when used in this specification, including the claims, unless the context clearly dictates otherwise, the word "a" or "an" does not include a specific number includes its plural form and reference to a particular numerical value includes at least that specific value. The term "plurality", as used herein, means greater than one. When a range of values is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent "about," it will be understood that the particular value forms another embodiment. All ranges are inclusive and combinable.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments herein, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any subcombination. Moreover, recitation of ranges of values herein are inclusive of each and every value within the range.
In a first embodiment, the invention provides a method of analysing the presence or even the relative position of one or more exons. Suitably, the methods comprise labelling first and second locations on a biopolymer sample with first and second labels, respectively, such that said first and second labels flank a first region of said biopolymer sample comprising at least one common exon. The user then correlates the distance between the first and second markers to the presence or absence (or relative position) of an alternative exon (i.e., an exon that is not present in each mRNA) in the first region of the biopolymer. (the biopolymer is a suitable DNA complementary to the mRNA; such DNA can be readily synthesized by a person of ordinary skill in the art.)
In some embodiments, the first and second labels are the same fluorophore. A wide range of fluorophores are suitable for use in the present invention, including the Cy-family of fluorophores. Other fluorophores will be known to those skilled in the art; a list of fluorophores can be found, for example, inhttp://info.med.yale.edu/genetics/ward/ tavi/FISHdyes2.html. The labels may have the same fluorophore, but may also have different fluorophores.
Suitably, the user relates the distance between the first and second markers to the presence, absence, or both of one or more variable exons (or even the relative positions of the exons), which comprises comparing the distance between the first and second markers present on the biopolymer sample with the distance between markers flanking a first region of the biopolymer known to be free of variable exons. This is suitably done by linearizing the region of the biopolymer containing the fluorescent label. Linearization of biopolymers is discussed in detail in U.S. patent application No.10/484,293 (issued on 11/9 of 2009), which is incorporated herein by reference in its entirety for all purposes.
Suitably, establishing the association comprises comparing the distance between the first and second labels to the distance between the first and second positions on the biopolymer lacking the variable exon between the first and second positions.
This is illustrated, for example, by figure 2C, which depicts a biopolymer without an alternative exon at "zero". Embodiment "2" in this figure depicts a biopolymer with a variable exon "2" that can be detected by observing that this exon results in an increase in the separation distance between the Cy3 and Cy5 dyes (342bp) that are separated by only 255bp in the variable exon-free biopolymer shown at the top of the figure.
Figure 2B is a table showing the size of each exon (variable exons are shown by shaded blocks) present in the MAPT gene. FIG. 2A illustrates in general terms the various splicing arrangements that may occur in the MAPT gene. As shown in this figure, exons 2, 3, and 10 are considered "variable" exons and may or may not be present in MAPT mRNA.
The user may also use additional labels (using, for example, labeled nucleotides) to appropriately label third, fourth, or even more locations on the biopolymer. Such additional labels may comprise the same fluorophores as the first and second labels, or may comprise fluorophores different from those on the first and second labels. The user can then correlate the distance between the third marker and the first marker, the third marker and the second marker, or both, with the presence or absence of a variable exon located between the third marker and the first marker, between the third marker and the second marker, or both. The association may also provide the relative position of one or more markers.
This is also shown in figure 2. In embodiments labeled "2 + 10", the biopolymer comprises variable exons 2 and 10, which exons are located between the first and second and third labels (read from left to right) on the biopolymer. The user can then determine the presence (or relative position) of these exons by comparing the distance between the markers on the "2 + 10" embodiment with the distance between the markers on the "zero" embodiment shown at the top of the figure.
In addition to collecting structural information about the biopolymer under investigation by the distance between labels, the user can also obtain structural information based on the relative order of two or more probes, which can be facilitated by probes carrying different color fluorophores. For example, if three probes (red, yellow, and green) are used, the sequence to which the probes bind in the order red-yellow-green is structurally different from the sequence to which the probes bind in the order yellow-red-green. Thus, the user can gather information about the sample by observing both the relative order in which the probes are bound/arrayed on the sample and the relative distance between the probes.
Returning to the non-limiting example described above, a user comparing two samples can-by interpreting the relative order of the probes and the distance between the probes-determine the difference between the two samples in the following ways: (1) the order in which certain nucleotide sequences appear (appeal) (as evidenced by the different order of probes on different samples) and (2) the number of, for example, copy variations (copy variations) in a given sample (as evidenced by certain probes spaced farther apart on one sample than on another).
The labels are suitably spaced from each other by about 30bp to about 1000bp, but more suitably about 30 bp. As described elsewhere herein, many techniques (e.g., SHRIMP, FIONA, SHREC, or other techniques known to those of ordinary skill in the art) are capable of resolving markers that are separated from each other by small distances on the order of only hundreds or even tens of base pairs.
In another aspect, the present invention provides a method of obtaining structural information about a DNA sample. Suitably, the methods comprise nicking a first double stranded DNA sample using a sequence specific nicking endonuclease. Such "nickases" are known in the art and are available, for example, from New England Biolabs (www.neb.com).
The method suitably comprises incorporating one or more dye-labelled nucleotides at two or more nicking sites generated by the nicking endonuclease. Nicking (nicking) can produce one, two, or more nicking sites along the length of the sample, depending on the endonuclease and sample being analyzed. The labeled nucleotides may be suitably incorporated into the biopolymer by a polymerase. In suitable embodiments, the labeled nucleotide is a terminator nucleotide that counteracts the action of the polymerase and does not promote further chain elongation. The nucleotides may carry the same fluorophore label or different labels, depending on the needs of the user.
The method also suitably comprises linearizing a portion of the first double-stranded DNA sample comprising at least two dye-labeled nucleotides. Once the labeled DNA is linearized, the user may record or otherwise specify the positions of the two or more labeled dye-labeled nucleotides for further analysis.
One such analysis includes correlating the relative positions of two or more dye-labeled nucleotides to one or more structural features of the first double-stranded DNA sample. This may require-as shown in fig. 9-determining the distance between two markers known to flank a target region, e.g. a region known to contain a certain mutation or copy number variation in some individuals. By comparing the inter-marker distance on the sample with the inter-marker distance on a control sample (or another sample taken from another individual or group of individuals (individuals)), the user can determine whether the subject being analyzed may (or may not) have a particular mutation.
In some embodiments, a "barcode" derived from the relative position of a marker present on a biopolymer sample provides information about the relative position of a first double-stranded DNA sample within a primary double-stranded DNA sample from which the first double-stranded DNA sample was derived. The term "barcode" refers to a set of signals (e.g., signals from fluorescent markers that are spaced apart from one another) that are representative of structural features of a sample (e.g., the distance between two markers can be correlated with the presence of additional copies of a gene in the region between the markers). The "barcode" may also be used to identify a particular sample in which a set of signals from markers located on the sample is unique to that sample or can be distinguished from other samples under study.
For example, the user may determine that a portion of the barcode on a first sample taken from the "female" sample overlaps with the barcode on a second sample taken from the "female" sample, thus indicating that the "female" sample contains a region common to the first and second samples. Such a "parent" sample can be digested to produce smaller oligonucleotides which can then themselves be analyzed by the various methods described herein, and then by "barcoding" the smaller oligonucleotides, the user can then determine the relative positions of the oligonucleotides in the "parent" sample.
This is shown in fig. 13, which depicts (plots) the following steps: a maternal sample of DNA is digested, barcodes are placed on the products resulting from the digestion, and the products with the corresponding barcodes are appropriately aligned-suitably by computational methods, to bring the maternal and the valid barcodes for the maternal together. In this way, the user can then associate the barcode on the mother with, for example, the physiological condition of the subject. This can be done where it is known that the restriction enzyme used to digest the parent can isolate genomic regions that may contain copy number variations, exons, or other mutations; such copy number variation, exon, or other mutation can be detected by comparing the distance between two markers located on the target region to the distance between two markers located on a "control" or "standard" known to lack (or have) the target mutation or exon.
As a non-limiting example, a user can place a barcode of the marker on the digestion products of the "maternal sample" by the methods described herein, and then use the barcode to reassemble those products by computer to reform the "maternal". The user can then compare the barcode of the "female parent" to other known samples to determine one or more characteristics of the female parent, such as copy number variation, addition or deletion of exons, and the like. In this way, the user can perform a qualitative assessment of the "maternal" sample by effectively placing all digestion products and their barcodes in their appropriate context within the "maternal".
The method may suitably comprise: nicking a second double-stranded DNA sample using a sequence-specific nicking endonuclease, incorporating one or more dye-labeled nucleotides at two or more nicking sites generated by the nicking endonuclease, linearizing a portion of the second double-stranded DNA sample comprising at least two dye-labeled nucleotides, and registering (e.g., recording or annotating) the relative positions of the two or more labeled dye-labeled nucleotides.
These relative positions of the markers, i.e. the barcodes, can (as described earlier) be used to determine the relationship between the first and second double stranded DNA samples in the main double stranded DNA sample from which both the first and second double stranded DNA samples are derived.
In some embodiments, the user compares the relative positions of two or more dye-labeled nucleotides to the positions of the same dye-labeled nucleotides on a second double-stranded DNA sample contacted with the same nicking endonuclease. In this way, a user may compare "barcodes" on different samples taken from different sources. This enables a qualitative comparison between multiple samples as shown in fig. 10. In this figure, samples are from subject A, B, and C, and were processed according to the claimed method. As shown, the sample of subject C lacks the marker that binds to the samples of subjects a and B, suggesting that subject C's DNA lacks this particular region. The user can then relate this region of deletion to the physiological characteristics of subject C, or can compare the results of subject C with the results of additional subjects to identify those characteristics common to individuals who have lost the DNA region.
Methods of obtaining sequence information about nucleic acid biopolymers are also provided. These methods suitably comprise binding a first fluorescently labeled sequence-specific probe having a first binding sequence to the single-stranded nucleic acid biopolymer. This is shown, for example, in fig. 11. The user then contacts the single stranded nucleic acid biopolymer with a first terminator nucleotide carrying fluorescent label a (e.g., adenine carrying Cy5), with a second terminator nucleotide carrying fluorescent label B (e.g., cytosine carrying Alexa 405), with a third terminator nucleotide carrying fluorescent label C, and with a fourth terminator nucleotide carrying fluorescent label D. The user then illuminates the nucleic acid biopolymer to determine the presence (or relative position) of the first terminator nucleotide, the second terminator nucleotide, the third terminator nucleotide, the fourth terminator nucleotide, or any combination thereof adjacent to the first labeled sequence-specific probe.
The binding sequence of the first probe is suitably between 4 and 6 nucleotides. In some embodiments, the fluorescent labels of the nucleotides have different excitation wavelengths. In other cases, two or more labels share an excitation wavelength. The excitation wavelength of the labeled nucleotide and the excitation wavelength of the labeled sequence-specific probe may be the same-or different.
The method also suitably comprises contacting at least four fluorescently labeled probes having second, third, fourth, and fifth binding sequences, respectively, with the single-stranded nucleic acid biopolymer. The second binding sequence is suitably configured by removing the base located at the 5 'end of the first binding sequence and adding a first alternative base to the 3' end of the first binding sequence.
Similarly, the third binding sequence is constructed by removing the base located at the 5 'end of the first binding sequence and adding a second alternative base to the 3' end of the first binding sequence. The fourth binding sequence is suitably configured by removing the base located at the 5 'end of the first binding sequence and adding a third alternative base to the 3' end of the first binding sequence, and the fifth binding sequence is configured by removing the base located at the 5 'end of the first binding sequence and adding a fourth alternative base to the 3' end of the first binding sequence. The probes suitably carry fluorophores that are different from each other, and may have fluorophores that are different from the first probe.
As a non-limiting example, the first probe may comprise the sequence 5 '-CTAGC-3'. In the second probing cycle, the C at the 5 ' end of the probe is removed, then T becomes the 5 ' end of the probe, and the 3 ' end of the probe is as follows: 5 '-TAGCA-3'; 5 '-TAGCT-3'; 5 '-TAGCG-3'; 5 '-TAGCC-3'. These labelled probes are then contacted with the biopolymer and by illuminating the probes with the appropriate excitation wavelength, the user can determine the location of the new probes and thus obtain sequence information about the biopolymer under investigation. Although the binding sequence shown in this example is 5bp in length, the binding sequence is suitably 1 to 100bp in length, more suitably 4bp to 6bp in length.
The method also suitably comprises irradiating the nucleic acid biopolymer to determine the presence (or relative position) of a first terminator nucleotide, a second terminator nucleotide, a third terminator nucleotide, a fourth terminator nucleotide, or any combination thereof adjacent to the second marker sequence-specific probe.
FIG. 11 is a non-limiting embodiment of the process. As shown in this figure, the user can bind first and second probes having different binding sequences to the biopolymer sample. The user may then contact the sample with labeled nucleotides under conditions such that only a single nucleotide binds to single stranded DNA adjacent to the bound probe. This results in a given probe-nucleotide pair displaying two labels, which may-as shown in the figure-differ from each other. The user may then illuminate the sample as necessary to visualize or otherwise locate the probe-nucleotide pair. The probe and the nucleotide may be ligated by a ligase. In some embodiments, there may be a gap (gap) (1+ bps) between the probe and the nucleotide, which gap may be filled by the polymerase and the supply of nucleotides, which themselves may be labeled. Ligases may also be used to ligate to the probes, where gaps are "filled in" by labeled nucleotides. Non-fluorescent probes may be used.
The user can begin a second cycle with the probe after completing the first cycle of probe binding and subsequent labeled nucleotide binding, the probe of the second cycle taking into account the sequence information learned in the first cycle. For example, the first probe may have an AAGG sequence and the labeled nucleotide adjacent to the probe bound is T. In the next cycle, the user can use this information and use probes with the sequence AGGT in order to obtain more sequence information as described above.
In another aspect, the present invention provides a method of obtaining structural information about a nucleic acid biopolymer. These methods suitably comprise (a) contacting the double-stranded biopolymer with a nicking endonuclease to generate at least two nicking sites; (b) contacting the at least two nicking sites with a first nucleotide carrying a fluorescent label a (e.g., Cy 3); (c) removing excess first nucleotide; (d) irradiating the double-stranded biopolymer to determine the presence or relative position of a first nucleotide; (e) contacting the at least two nicking sites with a second nucleotide carrying a fluorescent label B (e.g., Cy5) and (f) removing excess second nucleotide. The user suitably irradiates the double stranded biopolymer to determine the presence or relative position of the second nucleotide.
The user suitably contacts at least two nicking sites with a third nucleotide carrying a fluorescent label C (e.g. Alexa 405); removing excess third nucleotide; (j) illuminating the double-stranded biopolymer to determine the presence or relative position of a third nucleotide. The method further comprises (k) contacting the at least two nicking sites with a fourth nucleotide carrying a fluorescent label D, (i) removing excess fourth nucleotide; and (m) irradiating the double-stranded biopolymer to determine the presence or relative position of the first nucleotide.
In this way, the nicking enzyme "opens" the double stranded sample, making available nucleotides adjacent to the position to which the nicking enzyme binds. The user then introduces the first labeled nucleotide (e.g., cytosine) and analyzes the biopolymer to determine if and where the nucleotide may have bound. This operation is then repeated using other nucleotides (guanine, tyrosine, adenosine), after each of which is introduced, the user analyzes the binding of each newly introduced nucleotide (by irradiation).
The preceding steps (identified as (b) to (m)) can then be repeated, enabling the user to obtain more sequence information by adding each successive labeled nucleotide.
The illumination is also suitable to establish the relative position of one or more labelled nucleotides. At least a portion of the sample having two or more markers is suitably linearized for use in this analysis. The user then determines the distance between two or more labeled nucleotides located within the linearized portion of the double stranded biopolymer. These distances can then be used to obtain the barcode for the sample being analyzed.
In some variations, the user may create a second nicking site adjacent to the termination nucleotide located at the first nicking site. The user suitably contacts the second nicking site with the first nucleotide carrying fluorescent label a, with the second nucleotide carrying fluorescent label B, with the third nucleotide carrying fluorescent label C, and with the fourth nucleotide carrying fluorescent label D, and irradiates the double stranded biopolymer to determine the incorporated labeled nucleotide at the second nicking site.
This is shown in fig. 12. As shown in this figure, two nickase molecules bind to the double stranded DNA sample and create nicking sites at their ends, shown by the boxed N in this figure. The user then introduces the labeled nucleotides in sequence. As shown in this figure, adenosine is first introduced and binds to a T on the DNA strand opposite the left probe. Because there is adenosine opposite the right probe, labeled adenosine does not bind at this site, and an "X" indicates that no binding has occurred there after the first labeled base has been introduced. Introducing additional nicking enzyme and labeled bases, the user can sequence the biopolymer target by sequentially adding labeled bases followed by irradiating the labeled sample. The sequence information collected from the method can then be used to design probes that bind to a particular sequence, which can then be used to "barcode" a given sample for further characterization, e.g., comparing the relative distance between two or more labeled probes on a first sample to the distance of the corresponding labeled probe on a different or control sample.
The invention also provides kits for performing multiplex hybridization. These kits suitably comprise, firstly, a plurality of hybridization probes. Each probe is suitably of a different colour or reacts to a different excitation wavelength. The kit also suitably includes instructions for applying at least two of these hybridization probes to a nucleic acid sample, linearizing the labeled sample, and imaging at least one hybridized nucleic acid. In some embodiments, a user images two or more hybridization probes to determine the distance between the two probes or the relative position between the two probes.
According to certain conditions, the user can fill (deposit) the entire biopolymer region between adjacent nick sites with labeled nucleotides. This is suitably done when the nick sites are relatively close to each other. Under illumination, the biopolymer region with at least some labeled nucleotides is lighter; the region lacking labeled nucleotides is darker. However, the user can still gather information from both the light and dark areas.
The so-called bright regions provide sequence information because the user can illuminate the region with excitation wavelengths corresponding to the various labeled nucleotides located within the region. In other embodiments, the user can assess whether a dark region-by virtue of its size-contains copy number variations, exons, or other structural features of interest by determining the distance between light regions (or even nucleotides) flanking the dark region. Thus, structural information can be collected from both light and dark regions.
In some embodiments, the nicking enzyme that the user may choose to use has a binding sequence that is complementary to a region of particular interest on the biopolymer sample. In this way, the user can effectively obtain sequence information for only that region (or regions) that is/are deemed to be of greatest interest or importance.
The user can also appropriately sequence at least a portion of the biopolymer sample by correlating the sequence of fluorophores visible under irradiation with the nucleotide to which one or more of the fluorophores correspond.
Other disclosures
Imaging technique
Several techniques improve the optical resolution in fluorescence imaging by at least one order of magnitude. The use of these imaging techniques for single molecule DNA and RNA analysis greatly facilitates the applications discussed above.
One such technique, known as single nanometer accurate Fluorescence Imaging (FIONA), involves localization of a single organic fluorophore by fitting a distribution function to the light collected from the fluorophore. The core of this distribution is that it can be located with an accuracy of 1.5 nm. FIONA has been used to study the translocation of molecular motors or to measure small distances.
An extension of this technique includes the use of photo-bleached single molecule high resolution imaging (SHRIMP), which is capable of resolving neighboring fluorophores with the same color with a resolution of about 10 nm. FIONA has been expanded to two colors, developing a process called single molecule high resolution co-localization (SHREC). The user can, for example, co-localize Cy3 and Cy5 dyes that are close together to 10nm, which can be attached to the ends of short DNA. Multicolor random optical reconstruction microscopy (STORM) methods can also be used, which allow for combinatorial pairing of reporter and activator molecules. Iterative, color-specific activation of a sparse subset (sparse subset) of these probes may allow localization with nanometer accuracy.
Genome mapping method
Structural variations play a very important role in human health and in common diseases. These variations are defined as longer than 1 kb. However, despite their importance, most genome-wide approach for detecting Copy Number Variation (CNVs) is indirect, with prediction of the variation region based on the difference in signal intensity between the sample and the control. Such methods therefore provide limited quantitative signal and positional information and are unable to detect equilibrium events such as inversions and translocations. For example, microarray-based platforms including SNP arrays, oligonucleotide Comparative Genomic Hybridization (CGH) arrays, and BAC CGH arrays are the primary techniques for discovering structural variations. These platform sensitivity, specificity, and probe density inconsistencies often lead to conflicting results, even for identical samples. This qualitative measurement needs to be further confirmed by low throughput detection methods such as PCR and FISH.
Optical mapping
The single molecule techniques described above are well suited for studying structural variations. However, due to the optical nature of the mapping, they are limited in their ability to resolve motifs closer than about 1 kbp. Significantly higher mapping efficiencies can be achieved by resolving features that are less than 100bp apart. Further, this greatly improves our ability to identify structural variations in natural, long genomic DNA molecules.
Suitable mapping protocols are based on labeling of sites generated by nicking endonucleases. A nicking endonuclease with a five base recognition sequence will on average generate a 1kb physical map across the entire genome. Based on computer (in silico) whole genome mapping, most of these nick sites fall within 1000bp of each other, a distance that cannot be resolved using conventional optical devices. This reduces the atlas resolution and makes assembly of the atlas more difficult.
One example is the recognition sequences (motifs) of two commercially available nicking endonucleases, having 5 to 7 base recognition sites. An algorithm was designed to map all the nick sites against the human reference genome.
BstNBI (5 base motif GACTC) in the case of the enzyme Nt.BstNBI, there is 2.1X 10 across the entire human genome6One site, which indicates an average of 1.5kb between nicks. BspQI (7 base motif GCTCTTC) has an average of 2.2X 10 kbps5A nick site. In principle, nicking sites using the 5 base motif can be resolved using conventional optical devices (-1 kbp), but computer analysis shows that almost half of the nicking sites fall within 1kbp of each other, making them indistinguishable from each other. Using the 7 base motif, one can resolve a greater number of sites. As discussed below, this leads to challenges in unique mapping of DNA fragments.
Improving resolution in DNA mapping
Computer mapping was used to determine the percentage of DNA fragments that could be uniquely mapped based on currently available nickases and our existing optical detection systems.
Figure 1A shows the results of nicking endonuclease nt. For an optical resolution of 1000bp, only about 12% of the fragments can be uniquely identified using 8 nick sites. On the other hand, after a resolution of 100bp was achieved, more than 97% of the fragments were unique. The tightly packed nicking sites are loaded with more sequence information and their distribution is unique. Furthermore, with only 8 nick sites, one needs only 12kb fragments (averaged) to be able to achieve unique mapping of the fragments to the reference genome.
The nicking map (fig. 1B) of the enzyme nt. bspqi (7 base motif) shows that increasing the resolution to 100bp, one gains little because fewer nicking sites of nt. bspqi fall within 1kbp of each other. Using this enzyme, an average of 8 consecutive nt. Due to the lack of continuous nick sites within the length of DNA that can be reasonably extracted using existing methods, there are considerable genomic regions (30%) that cannot be mapped.
Without being bound by any single theory, some of the advantages of the claimed invention may be identified. First, much more information about the DNA fragments can be obtained when distinguishing between closely spaced nick sites. The ability to uniquely map fragments against the genome is greatly enhanced.
Second, as resolution increases, one can resolve much smaller structural variations than is currently possible using optical methods. Finally, increasing the resolution also helps us identify large levels of structural variation.
Other background on the drawings
An example of a fragment with a 150kbp insert is shown in figure 1C. Successfully mapping the fragments (and thus identifying the position of the insert within the genome), a contiguous set of 8 nicking sites (contiguous sets) adjacent to the insert may be used. With limited optical resolution, this requires large (>300kbp) genomic fragments. These are difficult to generate using standard DNA extraction protocols. In contrast, at a resolution of 100bp, one can uniquely map the fragments with a dense distribution of nick sites using fragments only slightly larger than the insert.
Requirement for alternative transcriptome (alternative transcriptome) high throughput digital mapping (profiling)
Another nucleic acid assay that may greatly benefit from increased mapping capability is variable splicing of RNA. During splicing of the RNA precursor, introns are removed and exons are ligated together to form the mature RNA. By this process of alternative splicing, a single primary transcript produces a different mature RNA. This results in the production of isoforms of proteins with diverse and even antagonistic functions. Recent studies have shown that the complexity and diversity of large proteomics (proteomics) is achieved with a limited number of genes. In the human genome, 75% of human genes show alternative splicing. Although the human genome contains 25,000 genes, it can produce hundreds of thousands of different types of proteins by alternative splicing.
Alternative splice variants of many genes have a decisive influence on all major aspects of cell biology including cell cycle regulation, apoptosis, etc. Aberrant splicing has been found to be associated with various diseases, including cancer, and recent studies have shown that mRNA is alternatively spliced more frequently in cancerous tissues than in normal tissues. Other examples include a significant reduction in the full length transmembrane conductance regulator (CFTR) gene due to inclusion and inclusion of (inclusions and inclusions) aberrant exons, which causes atypical forms of cystic fibrosis. Another example is the microtubule-associated protein tau (MAPT gene). MAPT is essential for the polymerization and stability of microtubules and axonal transport in neurons. Aberrant splicing of tau exon 10 results in the development of FTDP-17, a neurodegenerative disease.
A number of techniques have been developed to quantify RNA splice variants. First, oligonucleotide microarrays and fiber optic arrays have been used to detect gene splice variants in bulk. However, because small fragments of one whole RNA transcript are queried at a time in array technology, only one splicing event (two exons at a time) can be detected at a time. Thus, it is difficult to quantify how many exons are included or excluded in a particular splice variant. Moreover, non-specific hybridization can produce many false positives that require further confirmation.
Second, real-time PCR can obtain splicing information by quantifying one exon junction at a time, but it is limited by stringent reaction conditions, low throughput, and high cost. Third, so-called next generation sequencing techniques have been used in digital gene expression profiling, and can be used to map alternative splice variants. However, they are mainly based on short sequence reads and they have the same limitations as microarrays with respect to full-length RNA samples.
A disadvantage shared by existing technologies focusing on transcriptomes is that none are able to monitor combinations of alternatively spliced exons as they occur within individual transcripts. Under current methods, it is difficult to confirm the exclusion of exons, which can lead to false exclusion of certain exons.
Although alternative splicing is of paramount importance to mammalian biology, current solutions to this problem are challenging to decipher. Indeed, due to the lack of robust methods to quantify RNA splice variants, little is known about how to regulate and coordinate alternative splicing throughout the developmental stages.
Resolution enhancement beyond conventional optical limitations
As an example of the advantages that can be obtained with increased resolution, one can think of optical barcode methods for the microtubule-associated protein tau (MAPT) gene, which is essential for the aggregation and stability of microtubules and axonal transport in neurons. Aberrant splicing of tau exon 10 results in the development of neurodegenerative diseases such as dementia FTDP-17.
An exemplary RNA barcode scheme is shown in figure 2. Three exons (2, 3, and 10) in MAPT transcripts can undergo alternative splicing, with exon 2 and exon 3 always spliced together. Thus, six different MAPT transcripts could be produced by alternative splicing. The MAPT gene structure is shown in figure 2A.
All six possible alternatively spliced isoforms (zero, 2, 10, 2+10, etc.) are shown, and the length of each exon is shown in fig. 2B. Conventional optical resolution does not allow discrimination between markers linked to different exons. If the location of the exons can be resolved, the distance between the markers measured will identify each splice variant in a similar way as a barcode read.
To generate barcodes in this example, four exon-specific oligonucleotide probes were designed to specifically hybridize to exon 1(Cy 3-green), exon 7(Cy 5-red), exon 11(Cy 5-red), and exon 13(Cy 3-green), respectively, as indicated by the green and red arrows in FIG. 2C. The distance between the markers can be used to identify which variant is present and the color sequence (i.e., green-red-green) indicates the presence of the full-marker transcript. Moreover, the barcode scheme of the present disclosure is easily multiplexed.
For example, if the same two colors (e.g., green and red) are used with four different probes to label a different gene, a different color sequence can be designed for this particular gene than for the MAPT gene. Thus, a color sequence can be used to define a particular gene, and the distance between the markers of the color sequence determines the individual splice variants of that particular gene. In this two-color, four-probe method, there is 2416 different color sequences simultaneously interrogate splice variants of 16 different genes with unlimited capacity. If 4 colors of 8 different probes are used, 4 can be investigated simultaneously665536 different genes, which exceeded the entire human transcriptome (fig. 2C).
This approach has three important advantages over current expression mapping techniques for interrogating RNA splicing: (i) by mapping the distribution of exons within a single transcript simultaneously, one can determine the relationship between multiple alternatively spliced exons within the same transcript. (ii) The numerical nature of the barcode scheme means that one can quantify not only individual splice variants, but also the overall gene expression by adding all splice variants together. (iii) The bar code scheme will provide maximum multiple detection capability. Achieving these advantages requires imaging techniques with resolutions far exceeding conventional optical methods.
The need for low cost and high throughput whole genome sequencing
The success of the Human Genome Project (HGP) is largely due to the continued development of Sanger sequencing methods by means of parallelization, automation, miniaturization, better chemistry and informatics. As a master of the human genome project, Sanger sequencing has dominated the field of DNA sequencing for nearly thirty years, and its 800Q20 base read length has significant meaning.
These emerging sequencing technologies can be divided into two categories based on detection methods: sequencing by total detection (ensembledetection) or by single molecule detection. Because multiple copies of DNA are required in the overall assay, genetic information such as haplotype and RNA splicing patterns are lost during this process. Although sequencing by single-molecule detection may enable the recovery of haplotype information, current single-molecule sequencing methods (e.g., Helicos tSMS) read lengths of 50bp or less, which are much shorter than the average distance of 1kbp between two SNPs. Thus, as with ancestral Sanger sequencing, critical genetic information such as haplotype and RNA splicing patterns remain difficult to obtain using these "next generation" sequencing techniques. The present invention achieves in particular DNA sequencing lengths of more than 10 kb.
Sequencing by hybridization is a well-known method for determining the sequence of nucleic acid molecules using microarray-based hybridization analysis. Typically, short oligonucleotides of known sequence (<100 mers) constructed on a microarray are used to capture (i.e., hybridize) and interrogate target molecules. The microarray analysis produces a list of all sequences of hybridized oligonucleotides found at least once in the target molecule. However, the list does not show the positions of the hybridizing oligonucleotide sequences, nor does the list provide the number of times an oligonucleotide may be present on a target molecule. However, the present invention obtains such information.
FIG. 3 shows the starting material used for sequencing. A set of 5-mer (i.e., 5 nucleotides in length) oligonucleotides labeled at the 5' end with different color fluorophores; 4 nucleotide terminators labeled with different color fluorophores; a linearized single-stranded DNA molecule, or an array of double-stranded DNA molecules with partial ssDNA gaps.
Figure 4 depicts the first cycle of an exemplary sequencing reaction. After the first cycle, each hybridization and incorporation event was recorded and localized along the linearized DNA molecule by the STORM imaging technique. The probe was then washed away. In the next cycle, another 4-mer probes, AGTCA, AGTCT, AGTCG, and AGTCT, were introduced and hybridized at the same position as the previous probe because they share the same sequence as the previous probe. The polymerase then incorporates a nucleotide terminator (fig. 5).
This process was changed to multiplex (using different colored markers) and resulted in a large number of sequences read during one cycle (fig. 6). Algorithms have also been developed to determine the preferred order of sequential addition of 5-mer probes. Super imaging techniques used herein include SHRIMP, SHREC, STORM.
Examples
Single molecule high resolution co-localization (SHREC) and single molecule high resolution imaging using photobleaching (SHRImP) methods have been developed to measure the distance between two fluorophores that are closer than the Rayleigh limit (about 250nm for visible excitation).
Combining the two techniques adds another aspect in the ability of localization methodologies, and can potentially resolve tens of distances by using several differently colored fluorophores, each with multiple members. To apply this to DNA, double stranded DNA can be stretched over a polyacrylic and polyallylamine coated surface so that the DNA is relatively straight. To test SHRIMP, DNA constructs were made using biotin, and then three Cy-3 were added at positions 475bp, 172bp, and 94bp corresponding to the distances between Cy3 at 32nm, 58nm, and 90nm (fig. 7B).
More details are provided in fig. 7A. One PCR primer was labeled at the 5 'end using Cy3, and the other primer was phosphorylated at the 5' end. After the PCR reaction, the 5' end of Cy3 protected the strand from digestion by lambda exonuclease, which produced single stranded DNA molecules. Once the single-stranded DNA molecule is produced, a primer extension reaction can be performed to introduce a fluorescent dye at each specific sequence position. In this case, two short oligonucleotides with Cy3 at the 5' end were hybridized 94bp and 256bp from one end, respectively. Another short oligonucleotide with biotin at the 5 'end is hybridized at the 3' end of the single stranded template. After extension by polymerase, the single stranded template is converted to a double stranded DNA molecule and two Cy3 dye molecules are introduced at specific sites.
The distances measured were 27nm, 61nm, and 95nm, which gave excellent agreement with the expected distances. To test both SHRIMP and SHREC simultaneously, Cy5 was placed at the zero position, and two Cy3 were at the 94bp position and the 172bp position, with their positions measured using a dual-view imaging system (dual-view imaging system). The distance between the Cy3-Cy5 pairs was 37 + -5 nm (expected 32nm) and 91 + -5 nm (expected 87nm), and the distance between the Cy3-Cy3 pairs was 56 + -3 nm (expected 58nm) (FIG. 8). The consistency is excellent.
Claims (2)
1. A method of obtaining sequence information about a nucleic acid biopolymer, comprising:
binding a first fluorescently labeled sequence-specific probe having a first binding sequence to a single-stranded nucleic acid biopolymer;
contacting the single-stranded nucleic acid biopolymer with a first terminator nucleotide carrying fluorescent label a, with a second terminator nucleotide carrying fluorescent label B, with a third terminator nucleotide carrying fluorescent label C, and with a fourth terminator nucleotide carrying fluorescent label D; and
illuminating the nucleic acid biopolymer to determine the presence, relative position, or both of the first terminator nucleotide, second terminator nucleotide, third terminator nucleotide, fourth terminator nucleotide, or any combination thereof adjacent to the first labeled sequence-specific probe.
2. A method of obtaining structural information about a nucleic acid biopolymer, comprising:
(a) contacting a double-stranded biopolymer with a nicking endonuclease to create at least two nicking sites;
(b) contacting the at least two nicking sites with a first nucleotide carrying a fluorescent label a;
(c) removing excess first nucleotide;
(d) illuminating the double-stranded biopolymer to determine the presence or relative position of the first nucleotide;
(e) contacting the at least two nicking sites with a second nucleotide carrying a fluorescent label B;
(f) removing excess second nucleotide;
(g) illuminating the double-stranded biopolymer to determine the presence or relative position of the second nucleotide;
(h) contacting the at least two nicking sites with a third nucleotide carrying a fluorescent label C;
(i) removing excess third nucleotide;
(j) illuminating the double-stranded biopolymer to determine the presence or relative position of the third nucleotide;
(k) contacting the at least two nicking sites with a fourth nucleotide carrying a fluorescent label D;
(l) Removing excess fourth nucleotide; and
(m) illuminating the double stranded biopolymer to determine the presence or relative position of the first nucleotide.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US61/115,704 | 2008-11-18 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK1260391A1 true HK1260391A1 (en) | 2019-12-20 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102292454B (en) | Polynucleotide mapping and sequencing | |
| JP2013507964A (en) | Method and associated apparatus for single molecule whole genome analysis | |
| US20200056232A1 (en) | Dna sequencing and epigenome analysis | |
| AU2009267086B2 (en) | Methods and devices for single-molecule whole genome analysis | |
| US20150203907A1 (en) | Genome capture and sequencing to determine genome-wide copy number variation | |
| US10851411B2 (en) | Molecular identification with subnanometer localization accuracy | |
| Cai | Spatial mapping of single cells in human cerebral cortex using DARTFISH: A highly multiplexed method for in situ quantification of targeted RNA transcripts | |
| HK1260391A1 (en) | Polynucleotide mapping and sequencing | |
| HK1207404B (en) | Polynucleotide mapping and sequencing | |
| HK1166107B (en) | Polynucleotide mapping and sequencing | |
| JP3499795B2 (en) | Gene analysis method | |
| Xiao | 10 A Single DNA Molecule Barcoding Method with Applications in DNA Mapping and Molecular Haplotyping Ming Xiao and Pui-Yan Kwok |