HK1188607B

HK1188607B - Random array dna analysis by hybridization

Info

Publication number: HK1188607B
Application number: HK14101108.1A
Authority: HK
Inventors: R．T．德拉曼尼克
Original assignee: 完整基因有限公司
Priority date: 2003-02-26
Filing date: 2014-02-05
Publication date: 2016-12-16

Description

Random array DNA analysis by hybridization

The patent application of the invention is a divisional application of the patent application with the international application number of PCT/US2004/006022, the international application date of 2004-26.2. 200480010806.3, the application number of the Chinese national phase, and the name of random array DNA analysis by hybridization.

Cross reference to related applications

This application claims priority to U.S. provisional application 60/450, 566, filed on 26.2.2003, entitled "random array DNA analysis by hybridization," attorney docket number CAL-2. Related subject matter is disclosed in co-owned, co-pending U.S. patent application 10/738,108, entitled "single target molecule analysis by programming multiple transient interactions with probe molecules", attorney docket No. CAL-3, claiming priority of U.S. provisional application 60/435, 539, filed on 12/20/2002, entitled "single target molecule analysis by programming multiple transient interactions with probe molecules", attorney docket No. 30311/39054, filed on 12/16/2003. These and all other patents and patent applications are incorporated herein by reference.

Technical Field

The present invention relates to a method of analysing molecules and to an apparatus for performing such an analysis. Methods and apparatus for reliable analysis of single molecule nucleic acids. Such single molecules may be derived from natural samples, such as individual component cells, tissues, soil, air and water that are not isolated or enriched. In certain aspects of the invention, methods and devices are used to perform nucleic acid sequence analysis or nucleic acid quantification, including gene expression.

Background

There are three established DNA sequencing techniques. The predominant sequencing method in use today is based on Sanger's dideoxy chain termination method (Sanger et al, Proc. Natl. Acad. Sci. USA 74:5463(1977), incorporated herein by reference in its entirety), relying on various gel-based separation devices, ranging from manual systems to fully automated capillary sequencers. The Sanger method is technically difficult and the read length is limited to about 1kb or less, requiring multiple reads to achieve high accuracy. The second method, pyrosequencing, also uses a polymerase to generate sequence information, i.e., by monitoring pyrophosphate produced during successive cycles of testing for incorporation of specific DNA bases into a growing strand (Ronaghi, genome research 11:3(2001), incorporated herein by reference in its entirety). This method provides excellent multiwell plate assays, but is only used for local sequencing of very short 10-50 base fragments. This read length limitation is a serious limitation for sequence-based diagnostics.

The above two techniques represent direct sequencing methods, in which the position of each base in the chain is determined by the direct test sequence. Sequencing By Hybridization (SBH) (U.S. Pat. No. 5,202,231; Drmanac et al, genomics4:114 (1989), both incorporated herein by reference in their entirety), indirectly assembles the base sequence of the target DNA using the basic biochemical of base-specific hybridization of complementary nucleic acids. In SBH, overlapping probes of known sequence are hybridized to sample DNA molecules and the resulting hybridization pattern using computer algorithms is used to generate the target sequence (consensus, co-pending U.S. patent application 09/874,772; Drmanac et al, science 260: 1649-. Probes or DNA targets can be arranged in high density arrays (see, e.g., Cutler et al, genome Res. 11:1913-1925(2001), incorporated herein by reference in its entirety). Advantages of the SBH method include simplicity of the experiment, longer read length, higher accuracy and the ability to analyze multiple samples in a single assay.

Currently, there is an urgent need for new biodefense technologies that can rapidly and accurately detect, analyze and identify all potential pathogens in complex samples. Existing pathogen detection techniques typically lack the sensitivity and selectivity to accurately identify trace amounts of pathogens in a sample and are difficult to operate. Furthermore, in their current applications, all three sequencing techniques require large amounts of sample DNA. Samples are typically prepared by one of several amplification methods, primarily PCR. Although the costs associated with DNA amplification and sequence preparation are considerable, these methods, SBH in particular, can provide good sequence-based diagnosis of individual genes or of a mixture of 2-5 genes. Thus, all existing sequencing methods lack the speed and efficiency required to provide comprehensive sequence-based pathogen diagnosis and screening in complex biological samples at an acceptable cost level. This creates a large gap between the prior art capabilities and the new sequencing requirements. Ideally, a suitable diagnostic method should allow for the simultaneous screening of all important pathogens that may be present in environmental or clinical samples, including engineered mixtures of pathogens hidden in an organism.

The requirements for comprehensive pathogen diagnosis include the need to simultaneously sequence 10-100 important genes or the entire genome, screen hundreds of pathogens from it, and screen thousands of samples. Finally, for laboratories that perform continuous system measurements, 10-100Mb of DNA needs to be sequenced per sample, or 100Mb to 10Gb of DNA needs to be sequenced daily. Existing sequencing methods are 100 times less efficient in sequencing throughput and 100 times more expensive than the requirements needed for comprehensive pathogen diagnosis and presymptomatic measurements.

Existing biosensor technologies use a variety of molecular recognition strategies including antibodies, nucleic acid probes, aptamers, enzymes, biological receptors and other small molecule ligands (Iqbal et al, biosensor and bioelectronics 15: 549-. The molecular recognition element must be coupled to a reporter molecule or label to provide a positive detection event.

Both DNA hybridization and antibody-based techniques have been widely used for pathogen diagnosis. In general, nucleic acid-based techniques are more specific and sensitive than antibody-based detection, but are time consuming and less robust (Iqbal et al, 2000, supra). DNA amplification (by PCR or cloning) or signal amplification is often required to achieve reliable signal strength, and accurate pre-sequence knowledge is also required to construct pathogen-specific probes. Although the development of monoclonal antibodies has increased the specificity and reliability of immunoassays, this technique is relatively expensive and prone to false positive signals (Doing et al, J. Clin. Microbiol. 37: 1582-. Other molecular recognition technologies such as phage display, aptamers, and small molecule ligands are still in their early stages of development and are still not versatile enough to solve all pathogen detection problems.

The major disadvantage of all existing diagnostic techniques is their lack of sensitivity and versatility in detecting and identifying all potential pathogens in a sample. Weapon designers can easily design new biological warfare agents to thwart most pathogen-specific probes or immunoassays. There is clearly a pressing need for effective sequence-based diagnostics.

To this end, applicants developed a highly efficient genomic sequencing system, random DNA array-based sequencing by hybridization (rSBH). rSBH can be used for genomic sequence analysis of all genomes present in complex microbial communities as well as genomic sequencing of human individuals. rSBH does not require DNA cloning or DNA isolation and reduces the cost of sequencing by methods known in the art.

Disclosure of Invention

The present invention provides novel methods, compositions or mixtures and devices capable of analyzing single molecule DNA to rapidly and accurately sequence any long DNA fragment, mixture of fragments, whole gene, mixture of genes, mixture of mrnas, long fragment of chromosome, whole chromosome, mixture of chromosomes, whole genome or mixture of genomes. In addition, the invention provides methods for identifying nucleic acid sequences within target nucleic acids. Accurate and extensive sequence information is obtained from the compiled data by successive transient hybridizations. In one exemplary embodiment, a single target molecule is transiently hybridized to one probe or a population of probes. After hybridization and one or more probes are no longer present, the target molecule is again transiently hybridized to the next probe or probe population. The probe or group of probes may be the same or different from the previous transient hybridization probe. Compiling a series of sequential binding of the same single target molecule to one or more probe molecules of the same isotype provides a reliable measurement. Thus, because a single target molecule is contacted with the probe in series, a sufficient amount of data can be provided to identify sequences within the target molecule. By compiling the data, the nucleic acid sequence of the entire target molecule can be determined.

The invention also provides methods, compositions and devices for analyzing and detecting pathogens present in complex biological samples at the single organism level, as well as identifying all virulence control genes.

The present invention provides a method of analysing a target molecule, the method comprising the steps of:

a) contacting the target molecule with one or more probe molecules in a series of successive binding reactions, wherein each binding has an effect on either the target molecule or the probe molecule; and

b) the effect of a series of successive binding reactions is compiled.

The present invention also provides a method of analysing a target molecule, the method comprising the steps of:

a) contacting the target molecule with one or more probe molecules in a series of successive hybridization/dissociation reactions, wherein each binding has an effect on the target molecule or probe molecule; and

b) the effect of a series of successive hybridization/dissociation reactions is compiled.

In certain embodiments, the series comprises at least 5, at least 10, at least 25, at least 50, at least 100, or at least 1000 consecutive hybridization/dissociation or binding reactions. In one embodiment, the series comprises at least 5 and less than 50 consecutive hybridization/dissociation or binding reactions.

The invention includes embodiments in which the sequence or structure of the probe molecule is known or determinable. One advantage of such embodiments is that they are useful for identifying sequences in targets that program one or more probe effects of known/determinable sequence. Furthermore, when multiple overlapping sequences are identified within a target molecule, such identified overlapping sequences can be used to sequence the target molecule.

The invention also provides a method of analysing a target molecule wherein the effector mechanism involved in the analysis involves a time measurement (i.e. the length of time a signal is detected or the detection of a signal over a predetermined period of time, etc.). In certain embodiments, the effect is programmed by measuring the time at which a fluorescent signal is generated by a target molecule or probe molecule.

The invention also provides methods for programming effects by detecting signals generated only upon hybridization or binding of a target molecule to a probe. Methods include methods of programming an effect by measuring the amount of time period over which a signal is generated, and methods of programming an effect by measuring the amount of signal generated. In certain embodiments, the target molecule comprises a Fluorescence Resonance Energy Transfer (FRET) donor and a FRET acceptor containing probe molecule. In other embodiments, the target molecule comprises a FRET acceptor and a probe molecule comprising a FRET donor.

The invention also provides methods in which the effect on one or more probes is a modification of the probe. In certain embodiments, the probes are ligated, and the method further comprises detecting the ligated probes. The probe may be labeled with a nano-label.

In embodiments where the effect of hybridization or binding on the probe is a modification of the probe, the modifications resulting from a perfect match hybridization are more frequent than those resulting from a mismatch hybridization, and a perfect match can be determined by detecting the presence of a relatively high number of modifications.

The method of the invention comprises the following steps:

a) fragmentation of the nucleic acid molecule to produce a target molecule;

b) fragmentation is completed by restriction enzyme digestion, ultrasonic treatment, sodium hydroxide treatment or low-pressure shearing;

c) detectably labeling the target molecule;

d) detectably labeling the target molecules and/or probe molecules with a label selected from the group consisting of fluorescent labels, nano-labels, chemiluminescent labels, quantum dots, quantum beads, fluorescent proteins, dendrimers with fluorescent labels, micro-transponders, electron donor molecules or molecular structures, and light reflective particles;

e) detecting the label with a Charge Coupled Device (CCD);

f) the probe molecules with the same information area are respectively combined with the same detectable label;

g) one or more probe molecules comprise a plurality of labels;

h) partitioning the probe molecules into pools, each pool comprising at least two probe molecules having different information regions, all probe molecules within each pool being bound to the same label which is unique to the pool compared to the other pools;

i) assembling sequences of the target molecules by sequencing overlapping probe sequences that hybridize to the target molecules;

j) assembling the sequences of the target molecules by ordering the overlapping probe sequences and determining the fraction/likelihood/probability of assembled sequences from the hybridization efficiency of the incorporated probes;

k) the probes in the information region are independent and have a length of between 4 and 20 nucleotides;

l) the probes in the information region are each independent and between 4 and 100 nucleotides in length;

m) the target sequence of the attachment molecule has a length between about 20 and 2000 bases;

n) one or more probes consist of at least one modified or universal base;

o) one or more probes consisting of at least one universal base at a terminal position;

p) hybridization conditions effective to hybridize the target molecule to only those probes that are perfectly complementary to a portion of the target molecule;

q) contacting probe molecules comprising at least about 10, at least about 100, at least about 1000, or at least about 10,000 regions of information that differ from each other; and/or

r) use less than 1000, 800, 600, 400, 200, 100, 75, 50, 25 or 10 target molecules.

In one embodiment, the methods of the present invention can be used to analyze the genome of a microorganism in a microbial biofilm and a percentage of its composition. The biofilm community comprises microorganisms including Leptospira ferriphila (Leptospirillumferriphilum) phylum, Spirospira ferrooxidans (Ferrospirillumsp.), Thiobacillus thermosulfidooxidans phylum, archaea (including Ferroplasamaacibramnus, Aplasma, Geneplasma phylum), and eukaryotes (including protests and fungi).

The invention also provides methods of isothermal amplification based on the formation of single-stranded DNA for the purpose of invader oligonucleotide primer annealing using a strand-displacing enzyme.

The present invention also provides software that supports the rSBH whole genome (a complex DNA sample) and can handle sequences up to 3Gbp to 10 Gbp.

The invention also provides reagents and kits for simultaneously analyzing a plurality of genes or diagnostic regions and processing and preparing pathogen DNA from a blood sample.

The invention also includes compositions comprising mixtures of probes, target nucleic acids, and linker molecules for analyzing a number of pathogen genes or diagnostic regions from blood, tissue, or environmental samples.

Numerous other aspects and advantages of the present invention will become apparent to those skilled in the art upon consideration of the following detailed description of the invention taken in conjunction with the presently preferred embodiments.

Drawings

The detailed description of the invention may be better understood when read in conjunction with the following drawings:

fig. 1 depicts a joint connection and extension. The double-stranded hairpin linker (solid line) is maintained in the hairpin form by the cross-linking bases at the hairpin ends. B and F represent the binding primer and immobilized primer sequences, respectively, and their complementary sequences are indicated in lower case letters. Thin lines represent genomic sequences. A) Ligation of the non-phosphorylated adaptor to the genomic DNA creates a nick in the strand with a free 3' end (arrow). B) Extension from the 3' end results in replication of a replacement strand and linker sequence.

FIG. 2 depicts adaptor design and attachment to DNA fragments, where the solid black bars represent genomic DNA, F represents free primer, B represents bound primer, and F and B represent their complements, respectively.

Fig. 3 depicts the generation of ampliots on the chip surface. A) After melting of the linker captured genomic DNA, one strand is captured on the slide surface by hybridization with the binding primer B. Polymerase extension from primer B produces a double-stranded molecule. B) The template strand is removed by heating and washing the slide, and the free primer F is introduced and extended along the immobilized strand. C) The continuous strand displacement amplification by F results in a strand that can move to the adjacent hybridization site of primer B. D) The replacement strand was used as a template to extend from the new primer B site.

Figure 4 depicts the generation of ampliot using RNA intermediates. T7 represents the T7 phage RNA polymerase promoter. A) The single-stranded linker region hybridizes to the binding primer B and extension with DNA polymerase to form a second strand results in the formation of a double-stranded T7 promoter. B) The T7RNA polymerase produces RNA copies (dashed line). C) The RNA then binds to the adjacent primer B, producing cDNA using reverse transcriptase. The double stranded RNA is then disrupted with RNase H.

FIG. 5 is a schematic diagram depicting an invader-mediated isothermal DNA amplification process.

FIG. 6 depicts random array sequencing by the hybridization (rSBH) method. From top to bottom: (a) the CCD camera was placed on the reaction platform, and a lens was used to position a1 micron area on the platform²The area is enlarged and focused on one pixel of the CCD camera. (b) The array (3 mm x3 mm) consists of 1 million or more 1 micron²Area, which acts as a virtual reaction cell (corresponding to a single pixel of the CCD camera, respectively). Each pixel corresponds to the same location on the substrate. In a series of reactions in time, one CCD pixel can combine the data of several reactions, thus creating a virtual reaction cell. The DNA samples were randomly digested and arrayed on the surface of the reaction platform at an average concentration of one fragment per pixel. (c) The array was subjected to rSBH combinatorial ligation using one of several pools of informative probes. The signal of each pixel is recorded. (d) The probes of the first library are removed and a second round of rSBH combinatorial ligation is performed on the array with a different library or probe. (e) The display of a Fluorescence Resonance Energy Transfer (FRET) signal due to the ligation of two adjacent and complementary probes, whose complement is represented by the target, generates an insert of molecular detail.

Figure 7 depicts the rSBH reaction. The Total Internal Reflection Microscopy (TIRM) detection system results in an evanescent field where the enhanced excitation occurs only in the upper region of the glass substrate. The FRET signal is generated when the probes hybridize to the aligned targets, followed by ligation, thus locating the FRET pair within the evanescent zone. Unligated probes produce no detectable signal whether free in solution or transiently hybridized to the target. Thus, the evanescent zone of the TIRM system provides a strong signal in the desired plane while reducing background noise generated from unreacted probes.

Figure 8 depicts sequence assembly. Typically, in the SBH method, overlapping positive probes are used to assemble the target sequence. In this method, each base is read several times (i.e.10 reads with a 10-mer probe, etc.), which ensures a very high accuracy even if some probes are not recorded correctly.

Fig. 9 depicts a schematic of a microfluidic device for the rSBH method. The device integrates DNA preparation, random single molecule DNA array formation, combinatorial library mixing, and periodic loading and washing of reaction chambers. When the sample tube is attached to the chip, a series of reactions are performed with preloaded reagents to isolate and fragment DNA randomly attached to the chip surface at a density of approximately one molecule per pixel. Two probe pools from the 5 'and 3' sets of Informative Probe Pools (IPPs) were then mixed with the reaction solution using a microfluidic device. One set of the pool of probes is labeled with a FRET donor and the other set is labeled with a FRET acceptor. The mixed pool containing the DNA ligase is then transferred to a reaction chamber above the single molecule DNA. A detectable ligation event occurs when two probes (one per pool) hybridize to adjacent complementary sequences of a target DNA molecule within a narrow reflective region (100 nm) on the surface of the array. Ligation of the 5 'and 3' probes in the reflectance region generates a FRET signal which is detected and recorded with an ultra-sensitive CCD camera. After recording the ligation events, each library mixture was removed with a wash solution, a second pair of libraries of the same set of IPPs preloaded on the microfluidic chip were combined and introduced into the reaction chamber. By combining all possible pools within the two sets of IPPs, the presence/absence of each possible combination of probe sequences present within the array for each target molecule was recorded for both probe sets.

FIG. 10 depicts the basic optics and optical paths of a TIRM device. (a) A conventional substrate on top of the prism and a light path description that creates the evanescent field. (b) And (c) showing the use of an ammeter to control the optical path from the laser to the prism assembly.

Fig. 11 depicts a schematic of the rSBH components and processes, showing a step-by-step description of the components and experimental methods of the rSBH apparatus. Samples were collected and prepared independently of the device (steps 1 and 2). The resulting raw samples were further processed by the sample integration module (component a) for rSBH array formation (step 3). The targets were then aligned on the substrate module in the reaction cassette (assembly B). The SBH probe was transported through the probe module (module C) and the sample was subjected to SBH ligation assay using the SBH probe (step 4). The generated raw data is processed to generate the assembly of sequence data (step 5) and interpretation analysis (step 6).

FIG. 12 shows the perfect match join signals for 4 dotted targets. Four different targets were spotted at 7 different concentrations ranging from 1 to 90 micromolar. The ligation probe concentration (5 'probe: 3' probe ratio 1:1) varied from 0.1 to 1 picomolar/20 microliters.

FIG. 13 shows a schematic representation of capture probes with a dotted target as the other target. Ligation signals were measured when slides were directly hybridized/ligated with Tg2-5 'probe and Tgt2-3' probe (circles), when slides were prehybridized with targets Tgt2-Tgt1-rc and then ligated with Tgt2-5 'probe and Tgt2-3' probe (squares).

Detailed Description

The present invention provides methods and apparatus for single molecule DNA analysis to rapidly and accurately sequence any long DNA fragment, mixture of fragments, entire gene, mixture of genes, mixture of mrnas, long fragment of chromosome, entire chromosome, mixture of chromosomes, entire genome or mixture of genomes. The method of the invention allows detection of pathogens present in complex biological samples at the level of a single organism, and identification of virulence control genes. The method of the invention combines hybridization and, in particular, Sequencing By Hybridization (SBH) with Total Internal Reflection Microscopy (TIRM) or other sensitive optical methods with fluorescent, nanoparticle or electrical methods. The present invention also provides sample alignment techniques that create virtual reaction chambers associated with individual pixels of an ultrasensitive Charge Coupled Device (CCD) camera. Repetitive probing of aligned genomes to decode their sequences was performed using a full/universal set of information libraries and combinatorial ligation methods of fluorescently labeled oligonucleotide probes. The informative fluorescent signals were converted into assembled sequence data using bioinformatics algorithms (co-owned, co-pending U.S. patent application 09/874,772; Drmanac et al, science 260: 1649-. The device can sequence over 100 megabases of DNA per hour (30,000 bases/second) with a single small device positioned in a diagnostic laboratory or a small mobile laboratory. Due to the large capacity of random single molecule arrays, trace amounts of pathogen DNA in complex biological samples can be detected, identified and sequenced using the methods of the invention. Thus, random array sbh (rsbh) provides the necessary technology to allow DNA sequencing to play an important role in combating the defense of biological warfare agents, among other sequencing applications.

The present invention provides a single DNA molecule analysis method for the rapid, accurate detection and identification of any pathogen, and in general analysis of any DNA, including human individual DNA, in a complex biological mixture of pathogen, host and environmental DNA. The method of the invention allows the detection of pathogens present in a sample at the level of a single organism, and the identification of all virulence control genes. The method of the invention uses the combined hybridization/ligation method of the universal information probe libraries (IPPs) of the panel to random single molecule arrays after in situ amplification of about 10-or 100-, or 1000-or 10,000-fold directly or individually aligned molecules.

In a typical test, millions of randomly arranged single DNA molecules obtained from a sample are hybridized to pairs of IPPs representing a universal library of all possible probe sequences 8 to 10 bases in length. When two probes hybridize to adjacent complementary sequences in the target DNA, they are joined to produce a positive record of the target molecule, and a cumulative set of such records is compiled from overlapping probe sequence information to assemble the target sequence.

In another embodiment of the invention, the characteristics or sequences of a single target can be used to assemble longer sequences of an entire gene or genome. Furthermore, by calculating how many times the same molecule or the same fragment from the same gene appears in the array, quantitative gene expression or pathogen DNA can be obtained, and this data can be combined with the obtained sequences.

SBH is a well-developed technique that can be performed by many methods known to those skilled in the art. In particular, the technology relates to sequencing by hybridization as discussed in the following references, which are incorporated herein by reference in their entirety: bains and Smith, J.Theor.biol.135:303-307 (1988); beaucage and Caruthers, tetrahedron Lett.22:1859-1862 (1981); broude et al, Proc. Natl. Acad. Sci. USA 91: 3072-; breslauer et al, Proc. Natl. Acad. Sci. USA 83:3746-3750 (1986); doty et al, Proc. Natl. Acad. Sci. USA 46:461-466 (1990); chee et al, science 274:610-614 (1996); cheng et al, nat. Biotechnol.16:541-546 (1998); dianzani et al, Genomics11:48-53 (1991); PCT International patent application WO95/09248 to Drmanac; PCT International patent application WO96/17957 to Drmanac; PCT International patent application WO98/31836 to Drmanac; PCT International patent application WO99/09217 to Drmanac et al; PCT International patent application WO00/40758 to Drmanac et al; PCT international patent application WO 56937; drmanac and Jin, commonly owned, co-pending U.S. patent application 09/874,772; drmanac and Crkvenjakov, Scientia Yugoslavia 16:99-107 (1990); drmanac and Crkvenjakov, Intl.J.GenomeRes.1:59-79 (1992); drmanac and Drmanac, meth. enzymology303:165-178 (1999); drmanac et al, U.S. Pat. Nos. 5,202,231; drmanac et al, Nucl. acids sRs.14: 4691-4692 (1986); drmanac et al, Genomics4: 114-; drmanac et al, J.Biomol.struct.Dyn.8:1085-1102 (1991); drmanac et al, "partial sequencing by hybridization-concept and application in genomic analysis", first international conference on electrophoresis, supercomputing and human genome, pages 60-74, world scientific publishing company, Singapore, Malaysia (1991); drmanac et al, first International conference on electrophoresis, supercomputing and human genome, edited by Cantor et al, world scientific publishing company, Singapore, pp.7-59 (1991); drmanac et al, Nucl. acids sRs.19: 5839-5842 (1991); drmanac et al, Electrophoresis13:566-573 (1992); drmanac et al, science 260: 1649-; drmanac et al, DNAandCelbiol.9: 527-534 (1994); drmanac et al, Genomics37:29-40 (1996); drmanac et al, Nature Biotechnology16:54-58 (1998); gunderson et al, genome Res.8:1142-1153 (1998); hacia et al, Nature genetics14: 441-; hacia et al, genome Res.8: 1245-; hoheisel et al, mol.Gen.220:903-14:125-132 (1991); hoheisel et al, cell 73:109-120 (1993); holey et al, science 147: 1462-; housby and Southern, Nucl. acids sRs.26: 4259-4266 (1998); hunkapillar et al, science 254:59-63 (1991); khrapko, FEBSLett.256: 118-; kozal et al, Nature medicine7:753-759 (1996); labat and Drmanac, "sequencing simulation and sequence reconstruction of random DNA clones hybridized with a small number of oligomeric probes, second electrophoresis, supercomputing and International conference on human genome, pp 555-565, world science publishers, Singapore, Malaysia (1992); lehrach et al, genomic analysis: genetic and physical mapping 1:39-81(1990), Cold spring harbor laboratory Press; lysov et al, Dokl, Akad, Nauk, SSSR303:1508 1511 (1988); lockhart et al, nat. Biotechnol.14:1675-1680 (1996); maxam and Gilbert, Proc. Natl. Acad. Sci. USA 74: 560-; meier et al, Nucl. acids sRs.26: 2216-2223 (1998); michiels et al, CABIOS3:203-210 (1987); milosavljevic et al, genome Res.6:132-141 (1996); milosavljevic et al, Genomics37:77-86 (1996); nikiforov et al, Nucl. acids sRs.22: 4167-4175 (1994); pevznner and Lipschutz, "chip for DNA sequencing", the mathematical basis of computer science (1994); poustka and Lehrach, trends Genet.2:174-179 (1986); privara et al, eds., pp.143-158, pp.19 International seminar, MFCS94, Korice, Slovaria, Springer-Verlag, Berlin (1995); saiki et al, Proc. Natl. Acad. Sci. USA 86: 6230-; sanger et al, Proc. Natl. Acad. Sci. USA 74: 5463-; scholler et al, Nucl. acids sRs.23: 3842-3849 (1995); southern PCT International application WO 89/10977; southern U.S. patent 5,700,637; southern et al, Genomics13:1008-1017 (1992); strezoska et al, Proc. Natl. Acad. Sci. USA 88: 10089-; sugimoto et al, Nucl. acid Res.24: 4501-; wallace et al, Nucl. acids sRs.6: 3543-3557 (1979); wang et al, science 280:1077-1082 (1998); wetmur, Crit.Rev.biochem.mol.biol.26: 227-.

Advantages of rSBH:

rSBH minimizes or eliminates target-target blocking interactions between two target DNA molecules attached at an appropriate distance. The low complexity of the DNA sequence (200-300 bases) per spot reduces the possibility of mutually blocking inverted repeats. In some fragments, the palindromic sequence and hairpin arms are separated by an average of one nick per 20 bases of source DNA and ligated to non-complementary primer DNA. False positives are minimized because overlapping fragments have different repeats and/or strong mismatched sequences. The probe-probe ligation product can be used to make 11-13-mer probes from ligations by washing to remove a combination of hybridization/ligation specificity and differential perfect match/mismatch stability, which is possible to generate more accurate data. rSBH provides an efficient method of ligation with 3 probes in solution, including analysis of short DNA. The patterned probe library can be effectively used for both probe components to provide more informative data. Another advantage is that only very small amounts of source DNA are required. Eliminating the need to prepare standard probe-spot arrays and therefore reducing costs. rSBH provides multiplex sequencing up to 1000 samples, labeled with different primers and linkers. Furthermore, the present invention provides for the detection of single variants in a library of up to one million individual samples. By counting the two variants, heterozygotes can be detected. The present invention provides 10 to 100,000 times more information per surface than standard arrays.

1. Preparation and labelling of polynucleotides

The practice of the invention utilizes a variety of polynucleotides. Typically, some polynucleotides are detectably labeled. The types of polynucleotides used in the practice of the present invention include target nucleic acids and probes.

The term "probe" refers to a relatively short polynucleotide, preferably DNA. The probe is preferably at least 1 base shorter than the target nucleic acid, and the length of the probe is more preferably 25 bases or less, and the length is more preferably 20 bases or less. Of course, the optimal length of the probe will depend on the length of the target nucleic acid being analyzed. In de novo sequencing of a target nucleic acid consisting of about 100 or less bases (without a reference sequence), the probe is preferably at least a 7-mer; for a target nucleic acid of about 100 and 200 bases, the probe is preferably at least an 8-mer; for a target nucleic acid of about 200-400 bases, the probe is preferably at least a 9-mer; for a target nucleic acid of about 400-800 bases, the probe is preferably at least a 10-mer; for a target nucleic acid of about 800-1600 bases, the probe is preferably at least an 11-mer; for a target nucleic acid of about 1600-3200 bases, the probe is preferably at least a 12-mer; for a target nucleic acid of about 3200-; for a target nucleic acid of about 6400-12,800 bases, the probe is preferably at least a 14-mer. For every further two-fold increase in the length of the target nucleic acid, the optimal probe length is increased by one additional base.

One skilled in the art will recognize that for ligation probes used in SBH applications, the probe lengths are ligated. The probe is typically single-stranded, although double-stranded probes may be used in some applications.

While probes are generally composed of naturally occurring bases and phosphodiester backbones, they need not be. For example, the probe may consist of one or more modified bases, such as 7-deazaguanosine or a universal "M" base, or one or more modified backbone linkages, such as phosphorothioate. The only requirement is that the probe is capable of hybridizing to the target nucleic acid. It will be apparent to those skilled in the art that various modified bases and backbone linkages are known to be useful in conjunction with the present invention.

The probe length mentioned above refers to the length of the information content of the probe, and is not necessarily the actual physical length of the probe. Probes for SBH often contain degenerate ends that do not contribute to the information content of the probe. For example, SBH applications are often described by formula N_xB_yN_zWherein N represents any of 4 bases which vary for the polynucleotides in a given mixture, B represents any of 4 bases but is the same for each polynucleotide in a given mixture, and x, y and z are integers. Typically, x and z are independent integers between 0 and 5 and y is an integer between 4 and 20. Known base B_yThe number of (a) defines the "information content" of the polynucleotide, since the degenerate ends do not contribute to the information content of the probe. Linear arrays comprising a mixture of immobilized polynucleotides are useful, for example, in sequencing by hybridization. The hybridization discrimination of mismatches in these degenerate probe mixtures refers only to the length of the information content, and not to the entire physical length.

Probes for use in the present invention can be prepared by techniques well known in the art, such as automated synthesis using an applied biosystems synthesizer. In addition, probes can be prepared by the method of genosys biotechnologies, which uses a large number of porous teflon discs. The source of the oligonucleotide probe used is not critical for the purposes of the present invention, and those skilled in the art will appreciate that other methods of preparing oligonucleotides, now known or later developed, may suffice.

The term "target nucleic acid" refers to a polynucleotide or portions of a polynucleotide that require sequence information, typically a polynucleotide that is sequenced in an SBH assay. The target nucleic acid can be any number of nucleotides in length, depending on the length of the probe, but is generally about 100, 200, 400, 800, 1600, 3200, 6400 or more nucleotides in length. A sample typically has more than 100, more than 1000, more than 10,000, more than 100,000, more than one million, or more than ten million targets. The target nucleic acid can be composed of ribonucleotides, deoxyribonucleotides, or mixtures thereof. Typically, the target nucleic acid is DNA. Although the target nucleic acid may be double-stranded, it is preferably single-stranded. Moreover, the target nucleic acid can be obtained from virtually any source. It is preferably cut into fragments of the above size, depending on its length, before being determined using SBH. Similar to probes, target nucleic acids can be composed of one or more modified bases or backbone linkages.

The target nucleic acid can be obtained from any suitable source, such as cDNA, genomic DNA, chromosomal DNA, microdissected chromosomal bands, cosmids, or Yeast Artificial Chromosome (YAC) inserts, and RNA, including mRNA that has not undergone any amplification steps. For example, Sambrook et al molecular cloning: a laboratory Manual, Cold spring harbor Press, New York (1989), incorporated herein by reference in its entirety, describes three protocols for isolating high molecular weight DNA from mammalian cells (pages 9.14-9.23).

The polynucleotides are then generally fragmented by any method known to those skilled in the art, including, for example, digestion with restriction enzymes, ultrasonication, and NaOH treatment as described, for example, in Sambrook et al (1989) at pages 9.24-9.28. A particularly useful method for fragmenting DNA is the two base recognition endonuclease CviJI, described by Fitzgerald et al, Nucl. acids sRs.20: 3753-3762(1992), incorporated herein by reference in its entirety.

In a preferred embodiment, the target nucleic acids are prepared such that they cannot be ligated to each other, for example by treating fragmented nucleic acids obtained by enzymatic digestion or physical shearing with phosphatase (i.e., calf intestinal phosphatase). In addition, non-ligatable fragments of the sample nucleic acid can be generated by using random primers (i.e., N) in a Sanger dideoxy sequencing reaction of the sample nucleic acid₅-N₉Wherein N = A, G, T or C), the random primer having no phosphate at the 5' end.

In most cases, it is important to denature the DNA to produce single strands that can be used for hybridization. This can be done by incubating the DNA solution at 80-90 ℃ for 2-5 minutes. The solution was then rapidly cooled to 2 ℃ to prevent renaturation of the DNA fragments prior to contact with the probe.

The probe and/or target nucleic acid may be detectably labeled. In fact, any label that produces a detectable signal and that is capable of being immobilized on a substrate or attached to a polynucleotide can be used in conjunction with the arrays of the invention. The signal generated is preferably suitable for quantification. Suitable labels include, but are not limited to, radioisotopes, fluorophores, chromophores, chemiluminescent moieties.

Polynucleotide sequences that are labeled with fluorophores are preferred because of their ease of detection. Fluorophores suitable for labeling polynucleotides have been described, for example, in the molecular probes catalog (molecular probes, Eugene, Oreg.) and references cited therein. Methods for attaching fluorophore labels to polynucleotides are well known and can be found, for example, in Goodchild, bioconjugate. chem.1:165-187(1990), incorporated herein by reference in its entirety. A preferred fluorophore label is Cy5 dye, which is commercially available from amersham biosciences.

In addition, the probe or target may be labeled using any other technique known in the art. Preferred techniques include direct chemical labeling and enzymatic labeling, such as kinase and nick translation. Labeled probes are readily available from various commercial sources, including GENSET, rather than being synthesized.

Typically, the label can be attached to any portion of the probe or target polynucleotide, including the free end of one or more bases. When the label is attached to the solid support by a polynucleotide, it must be positioned at a location where it can be released from the solid support by cleavage with a mismatch-specific endonuclease, as described in co-owned, co-pending U.S. patent application 09/858,408 (incorporated herein by reference in its entirety). Preferably, the label position does not interfere with hybridization, ligation, cleavage, or other post-hybridization modification of the labeled polynucleotide.

Some embodiments of the invention use multiple labels, i.e., a plurality of distinguishable labels (e.g., different fluorophores). Multiplex labeling allows simultaneous detection of many sequences in a single hybridization reaction. For example, a multiplex label of 4 colors reduces the number of hybridizations required by adding factor 4.

Other embodiments use a pool of probe information to reduce the redundancy typically found in SBH protocols, thus reducing the number of hybridization reactions required to unambiguously determine a target DNA sequence. The probe information library and methods of use thereof can be found in co-owned, co-pending U.S. patent application 09/479,608, which is incorporated herein by reference in its entirety.

2. Attachment of polynucleotides to solid substrates

Some embodiments of the invention entail attaching a polynucleotide, such as a target DNA fragment, to a solid substrate. In a preferred embodiment, suitable DNA samples are detectably labeled and randomly attached to a solid substrate at a concentration of 1 fragment per pixel.

The nature and geometry of the solid substrate depends on various factors including the type of array and the manner of attachment (i.e., covalent or non-covalent). In general, the substrate may be composed of any material that allows for immobilization of the polynucleotide, and such material does not melt or significantly degrade under the conditions used to hybridize and/or denature the nucleic acid. Furthermore, in view of covalent immobilization, the substrate should be activatable with a reactive group capable of forming a covalent bond with the polynucleotide to be immobilized.

Many materials suitable for use as substrates in the present invention have been described in the art. In a preferred embodiment, the substrate consists of an optically transparent substance, such as a glass slide. Other suitable exemplary materials include, for example, acrylic acid, styrene-methyl methacrylate copolymers, ethylene/acrylic acid, acrylonitrile-butadiene-styrene (ABS) ABS/polycarbonate, ABS/polysulfone, ABS/polyvinyl chloride, ethylene propylene, Ethylene Vinyl Acetate (EVA), cellulose nitrate, nylons (including nylon 6, nylon 6/6, nylon 6/6-6, nylon 6/10, nylon 6/12, nylon 11, and nylon 12), Polyacrylonitrile (PAN), polyacrylate, polycarbonate, polybutylene terephthalate (PBT), polyethylene terephthalate (PET), polyethylene (including low density, linear low density, high density, crosslinked, and ultra-high molecular weight grades), polypropylene homopolymers, polypropylene copolymers, polystyrene (including general purpose grades and high impact grades), polytetrafluoroethylene (PTFE), fluorinated ethylene-propylene (FEP), ethylene-tetrafluoroethylene (ETFE), perfluorinated alkoxy ethylene (PFA), polyvinyl fluoride (PVF), polyvinylidene fluoride (PVDF), Polychlorotrifluoroethylene (PCTFE), polyethylene-chlorotrifluoroethylene (ECTFE), polyvinyl alcohol (PVA), styrene-acrylonitrile (SAN), Styrene Maleic Anhydride (SMA), metal oxides and glass.

Typically, the polynucleotide fragments may be bound to the support by means of suitable reactive groups. Such reactive groups are well known in the art and include, for example, amino (-NH2), hydroxyl (-OH), or carboxyl (-COOH). The support-linked polynucleotide fragments can be prepared by any method known to those skilled in the art using any suitable support, such as glass. Immobilization can be accomplished by a number of methods including, for example, the use of passive adsorption (Inouye and Hondo, J.Clin.Microbiol.28:1469-1472(1990), incorporated herein by reference in its entirety), the use of ultraviolet light (Dahlen et al, mol.Cell Probes1:159-168(1987), incorporated herein by reference in its entirety), or covalent binding of base-modified DNA (Keller et al, anal.biochem.170:441-451(1988), Keller et al, anal.biochem.177:392-395(1989), both incorporated herein by reference in their entirety) or the formation of an amide group between the probe and the support (Zhang et al, Nucl.AcidsRes.19:3929-3933(1991), incorporated herein by reference in its entirety).

It is contemplated that further suitable methods for use with the present invention are those described in PCT patent application WO90/03382(Southern et al), which is incorporated herein by reference. The method of preparing a polynucleotide fragment bound to a support comprises attaching a nucleoside 3' reagent to a support-borne aliphatic hydroxyl group via a covalent phosphodiester bond of the phosphate group. Then, an oligonucleotide is synthesized on the supported nucleotide, and the protecting group is removed from the synthesized oligonucleotide chain under standard conditions that do not cleave the oligonucleotide chain from the support. Suitable reagents include nucleoside phosphoramidites and nucleoside phosphines.

Furthermore, addressable laser-activated photo-deprotection can be used for chemical synthesis of oligonucleotides directly on glass surfaces, as described by Fodor et al, science 251:767-773(1991), incorporated herein by reference.

A specific manner of preparing support-bound polynucleotide fragments is by photogeneration synthesis as described by Pease et al, Proc. Natl.Acad.Sci.USA 91:5022-5026(1994), incorporated herein by reference. These authors used existing photolithographic techniques to generate arrays, i.e., DNA chips, on which oligonucleotide probes were immobilized. These methods of using light for directing oligonucleotide probe synthesis in high density, miniaturized arrays employ photolabile 5' -protected N-acyl-deoxynucleoside phosphoramidites, surface attachment chemistry and various combinatorial synthesis strategies. This method can be used to generate a matrix of 256 spatially defined oligonucleotide probes. And then used for SBH sequencing as described above.

In a preferred embodiment, the DNA fragment of the invention is attached to a solid matrix by a linker moiety. The linkage may be composed of atoms capable of forming at least two covalent bonds, such as carbon, silicon, oxygen, sulfur, phosphorus, and the like, or molecules capable of forming at least two covalent bonds, such as sugar-phosphate groups, amino acids, peptides, nucleosides, nucleotides, sugars, carbohydrates, aromatic rings, hydrocarbon rings, linear and branched hydrocarbons, and the like. In a particularly preferred embodiment of the present invention, the linking moiety is comprised of alkylene glycol moieties. In a preferred embodiment, a detectable label is attached to the DNA fragment (i.e., the target DNA).

3. Formation of detectably labeled duplexes on solid supports

In a preferred embodiment of the invention, a label probe is linked to a detectably labeled target nucleic acid by complementary base pairing interactions, which nucleic acid is itself linked to a solid support as part of a polynucleotide array, thereby forming a duplex. In another preferred embodiment, the label probe is covalently attached, i.e., linked to another probe that is linked to one target nucleic acid by complementary base pairing interactions, which nucleic acid is itself linked to a solid support as part of a spatially addressable polynucleotide array if both probes hybridize to the target nucleic acid in a contiguous manner.

As used herein, nucleotides are base-paired or "complementary" if they form a stable duplex or binding pair under specified conditions. The specificity of one base for another is dictated by the availability and orientation of hydrogen bond donors and acceptors on the base. For example, under conditions commonly used for hybridization assays, adenine (a) pairs with thymine (T) and not with guanine (G) or cytosine (C). Similarly, G pairs with C, but not a or T. Bases that interact in a less specific manner, such as hypoxanthine or universal bases (M bases, Nichols et al, Nature 369: 492-. Nucleotide bases that are not complementary to each other are referred to as "mismatches".

A pair of polynucleotides, such as a probe and a target nucleic acid, are said to be "complementary" or "paired" if the nucleic acids hybridize to each other, thus forming a duplex, under certain conditions by complementary nucleotide base-pairing mediated interactions. The duplex formed between two polynucleotides may include one or more base mismatches. Such duplexes are referred to as "mismatched duplexes" or hybrid duplexes. The more relaxed hybridization conditions, the more likely mismatches are tolerated and a relatively stable mismatched duplex can be formed.

A subset of paired polynucleotides, referred to as "perfectly complementary" or "perfectly matched" polynucleotides, consists of paired polynucleotides containing contiguous base sequences that are complementary to each other, with no mismatches (i.e., no environmental sequence effects, the duplex formed having the greatest binding energy of the particular nucleic acid sequence). The meaning of "perfectly complementary" and "perfectly matched" also includes polynucleotides and duplexes with analogs or modified nucleotides. "perfect pairing" for an analog or modified nucleotide is judged according to a "perfect pairing rule" (e.g., a binding pair having the greatest binding energy for the particular analog or modified nucleotide) selected for the analog or modified nucleotide.

In useAbove is provided with N_xB_yN_zIn the case of a probe pool of degenerate ends of type, perfect pairing includes the informative region of the probe, i.e.B_yThe regions are perfectly matched for any duplex. Discrimination mismatches in the N region do not affect the outcome of the hybridization experiment, as such mismatches do not interfere with the information derived from the experiment.

In a particularly preferred embodiment of the invention, polynucleotide arrays are provided in which the formation of target DNA fragments on a solid substrate is provided under conditions that allow them to hybridize to at least one set of detectably labeled oligonucleotide probes provided in solution. The probe length may be the same or different within a group or between groups. Criteria for determining suitable hybridization conditions can be found in the literature, e.g., Drmanac et al, (1990), Khrapko et al (1991), Broude et al, (1994) (all cited above) and WO98/31836, incorporated herein by reference in their entirety. These articles illustrate the hybridization temperature range, buffer and wash steps suitable for use in the initial steps of SBH. The probe sets may be applied to the target nucleic acid separately or simultaneously.

Probes that hybridize to consecutive sites on a target nucleic acid are covalently attached or linked to each other. The linkage may be by chemical linkers (e.g.water soluble carbodiimides or cyanogen bromides), by ligases such as commercially available T₄DNA ligase, by stratification or by any other method that causes the formation of a chemical bond between adjacent probes. Criteria for determining suitable ligation conditions can be found in the literature, for example, commonly owned U.S. patent applications 09/458,900, 09/479,608, and 10/738,108, which are incorporated herein by reference in their entirety.

4. Random array SBH (rSBH)

The method of the invention uses random arrays SBH (rSBH), which extend the combinatorial ligation approach to unimolecular arrays, greatly increasing the sensitivity and capacity of the method of the invention. rSBH relies on the serial examination of randomly arranged DNA fragments by tagging oligonucleotide libraries. In the method of the invention, a complex mixture of DNA to be sequenced is displayed on an optically transparent surface in the focal plane of a total internal fluorescence reflectance microscopy (TIRM) platform, and is continuously monitored with an ultrasensitive megapixel CCD camera. The DNA fragments are arranged at a concentration of about 1 to 3 molecules per square micron, the area corresponding to an individual CCD pixel. TIRM is used to visualize the focal spot and the close contact between the investigated object and the connecting surface. In TIRM, an evanescent field from an internally reflected excitation source selectively excites fluorescent molecules at or near the surface, resulting in very low background scattered light and good signal to background ratio. The background and its associated noise can be made low enough to detect single fluorescent molecules under ambient conditions (see Abney et al, Biophys. J.61:542-552 (1992); Ambrose et al, Cytometry36: 224-.

Using microfluidic technology, pairs of probe pools labeled with donor and acceptor fluorophores are mixed with DNA ligase and oriented in random arrays. When the probes hybridize to adjacent sites of the target fragment, they are ligated together, generating a Fluorescence Resonance Energy Transfer (FRET) signal. FRET is a distance-dependent (between 10-100 angstroms) interaction between the electronic excited states of two fluorescent molecules in which excitation is transferred from a donor molecule to an acceptor molecule without emission of a photon (Didenko, Biotechnologies 31:1106-1121 (2001); Ha Methods25:78-86 (2001); Klostermeier and Millar, Biopolymers61:159-179(2001-2002), incorporated herein by reference in its entirety). These signals can be detected by a CCD camera, indicating that there are paired sequences within the fragment. Once the signal from the first library is detected, the probes are removed and different probe combinations are tested with successive cycles. The entire sequence of each DNA fragment was compiled from the fluorescent signals generated by hundreds of independent hybridization/ligation events.

While only one detectable color is sufficient, multiple colors can add multiple combinations and increase the efficiency of the system. The state of the art suggests that four colors can be used simultaneously. In addition to conventional direct fluorescence strategies, time-resolved systems and time-resolved FRET signaling systems may also be used (Didenko, Biotechniques31: 1106-. New conventional chemical methods, such as a quantum dot enhanced triple FRET system, may also be used. The weak signal can be overcome by dendrimer technology and related signal amplification technology.

Unlike traditional hybridization methods, the present methods rely on the synergy of hybridization and ligation, where short probes from two pools are ligated together to produce longer probes with more informative capacity. For example, two sets of 1024 5-mer oligonucleotides can be combined to detect more than one million possible 10-mer sequence strings. The use of informative probe libraries, in which all probes share a common label, greatly simplifies the process, allowing millions of potential probe pairings to occur with only a few hundred library combinations. Reading consecutive bases with multiple overlapping probes allows accurate determination of DNA sequences from the hybridization pattern obtained. By extending their use to single molecule sequencing, the combinatorial ligation and information library techniques described above are enhanced.

5. Random DNA preparation of constructs

DNA isolation and initial fragmentation

This protocol was performed using well established protocols (Sambrook et al, supra, 1999; "New compiled molecular biology laboratory Manual", edited by Ausubel et al, John Wileyandsons, New York, 1999, both incorporated herein by reference in their entirety) or commercial kits [ e.g.: kits commercially available from QIAGEN (Valencia, CA) or Promega (Madison, Wl) ]. The important requirements are: 1) DNA is free of DNA processing enzymes and impurity salts; 2) the entire genome is represented identically; and 3) the DNA fragment is between 5,000 and 10,000 bases in length. Digestion of the DNA is not required, as the shear forces generated in the lysis and extraction will produce fragments in the desired range. In another embodiment, shorter fragments (1-5kb) may be produced by enzymatic fragmentation. The number of input genomes of 10-100 copies will ensure complete genome overlap and allow for poor capture rates of targets on the array. Yet another embodiment provides a vector, a circularly synthesized double stranded DNA for use with small amounts of DNA.

DNA normalization

In some embodiments, normalization of environmental samples is necessary to reduce DNA content of prevalent species to maximize the total number of different species sequenced per array. Since rSBH requires as few as 10 genomic equivalents, thorough DNA normalization or reduction methods can be performed. Normalization can be accomplished using methods commonly used to normalize cDNA libraries during their generation. The DNA collected from the sample is split into two parts, one of which is 10 times greater in mass than the other. A larger amount of the sample was biotinylated by terminal transferase and ddCTP and attached as single stranded DNA to a streptavidin column or streptavidin coated beads. Alternatively, biotinylated random primers can be used to generate sequences that are linked to streptavidin. Whole genome amplification methods (molecular Staging, NewHaven, CT) can also be used. The normalized sample is then hybridized to the ligated molecules, and those molecules in the sample that are represented too much are preferably removed from the solution due to the large number of binding sites. Several hybridization/removal cycles can be applied to the same sample to obtain complete normalization. Another embodiment provides for efficient hybridization of long double-stranded DNA fragments without DNA denaturation, with timed lambda exonuclease digestion to produce short terminal regions of single-stranded DNA.

Another embodiment provides sequencing of low abundance members that are difficult to analyze with a combination of DNA normalization and rSBH. Normalization of one sample to another allows monitoring of changes in the consensus structure as conditions change and identification of new members as changes in conditions.

C. Secondary DNA fragmentation and linker ligation

The present invention provides for suspending long DNA fragments generated by shear forces in a solution within a chamber located on a slide. The concentration of DNA was adjusted so that each fragment occupied a volume of 50x50x50 microns. The reaction chamber contains a mixture of restriction endonucleases, T4DNA ligase, strand displacement polymerase and specially designed linkers. Partial digestion of the DNA with restriction enzymes produced fragments of average length 250bp with identical overhang sequences. T4DNA ligase ligated non-phosphorylated double-stranded linkers to the ends of the genomic fragments via complementary cohesive ends to create a stably structured genomic insert with one linker at each end, but a nick in one strand, where the ligase was unable to catalyze the formation of phosphodiester bonds (FIG. 1). T4DNA ligase is active in most restriction enzyme buffers, but requires the addition of ATP and a molar excess of linker to genomic DNA to facilitate ligation of linkers at each end of the genomic molecule. The use of non-phosphorylated linkers is important to prevent linker-to-linker ligation. In addition, the linker contains two primer binding sites, and the linker is maintained in a hairpin structure by cross-linking bases at the ends of the hairpin, which prevents the linker from dissociating during high temperature melting. Extension from the 3' end with a strand displacing polymerase such as Vent or Bst results in the production of a DNA strand having linker sequences at both ends. However, the linker at one end will be maintained in the hairpin structure, serving to prevent binding of complementary sequences at the other end of the DNA fragment.

The present invention provides random DNA arrays to sequence multiple highly similar samples (i.e., individual DNA from a patient) in one assay, which labels DNA fragments of each sample prior to random array formation. One or both of the linkers used to incorporate the primer sequences into the ends of the DNA fragments may have a labeling cassette. Different marker cassettes may be used for each sample. After attachment of the linker (preferably by ligation), the DNA of all samples is mixed to form a single random array. After sequencing of the fragments is completed, the fragments belonging to each sample are identified by the designated tag sequence. Using this labeling method allows efficient sequencing of a small number of labeled DNA regions from about 10-1000 samples on a high capacity random array with up to about ten million DNA fragments.

DBA attachment and in situ amplification

The adaptor-ligated genomic DNA is then mapped on the slide by hybridization to an oligonucleotide complementary to the adaptor sequence (primer B) along with other fragments from the original 5-100kb fragment. Following linker ligation and DNA extension, the solution is heated to denature the molecules, which when contacted with the high concentration primer oligonucleotides attached to the surface of the slide, hybridize to these complementary sequences during the reannealing phase. In another embodiment, in situ amplification does not occur, the adaptor is attached to the support and the DNA fragments are ligated. Most of the DNA structure generated by one parent molecule was localized to one side of the slide, accounting for 50x50 microns; thus if restriction enzyme digestion of a parent molecule yields 1000 molecules, each fragment will occupy an average of 1-4 microns²And (4) a region. 1-4 microns²The area can be viewed with a single pixel of a CCD camera and represents one virtual reaction cell within an array of one million wells.

In a 50-100 micron thick capillary chamber that prevents turbulent flow of liquid, lateral diffusion of DNA fragments across the slide surface beyond 50 microns cannot occur significantly in a short time. In addition, high viscosity buffers or gels may be used to minimize diffusion. In another example, several hundred short DNA fragments from a single 5-100kb molecule are spread over a 50X50 micron surface, requiring limited turbulence. Note that diffusion is not necessarily perfect, as SBH can analyze a mixture of several DNA fragments at the same pixel site. Several fractions of the original sample with more consistent fragment lengths (i.e., 5-10kb, 10-20kb, 20-40kb, 40-1000kb) can be prepared, with equal spacing between short fragments. Furthermore, an electric field can be applied to the library to attach short DNA fragments to the surface. An array with a partial structure that locally blends short segments is nearly as effective as an array with a full structure because no short segments from any single, long segment are blended with short segments produced by about 10,000 other long starting segments.

Another embodiment of the invention provides a ligation method for attaching two primer sequences to a DNA fragment. The method is based on targeting single-stranded DNA resulting from denaturation of double-stranded DNA fragments. Because single-stranded DNA has unique 5 'and 3' ends, each end can be ligated with a specific primer sequence. Two specific adaptors were designed each comprising two oligonucleotides (FIG. 2) with specifically modified ends, where F and B represent unbound, solution-free primer (F) and surface-bound primer (B) sequences, and F and B represent sequences complementary to these primer sequences (i.e.primer F is complementary to primer F). Only the 3 '-OH group necessary for ligation to the DNA fragment is on primer F, and the other oligonucleotide may have a dideoxy 3' terminus (dd) to prevent linker-to-linker ligation. In addition to the 5 '-phosphate group (P) present in primer B, primer B may also have a 5' -P group for degradation of the primer after linker ligation to expose the primer B sequence for hybridization to the surface-attached primer/capture probe B. To allow the adaptor to be ligated to any DNA fragment generated from the source DNA by random fragmentation, the oligonucleotides f and B have several (about 3 to 9, preferably 5 to 7) degenerate bases (Ns).

Although the rSBH assay is designed for single molecule detection, some embodiments amplify each DNA target in situ. The method of the invention provides isothermal exponential amplification within one micron-sized localized amplicon, referred to herein as "ampliot" (defined as the amplicon site) (FIG. 3). Amplification was accomplished by using primers bound to the surface (primer B) and free primers in one solution (primer F). Primer B first hybridizes to the original target sequence and then extends the replicated target sequence. The non-attached strand is melted and washed away. New reagent components including DNA polymerase having strand displacement properties (e.g., BstDNA polymerase), dNTPs and primer F are added. The new strand is then synthesized using successive amplification reactions and replaces the previously synthesized complement.

The successive exponential amplification reaction produces a displaced strand that contains sequences complementary to the capture array oligonucleotides and, thus, is in turn captured and used as a template for further amplification. This strand displacement method requires that the primers are capable of initiating polymerization continuously. There are several strategies described in the art, such as ICAN^TMTechniques (TakaraBioEurope, Gennevilliers, France) and SPIA techniques (NuGEN, SanCarlos, Ca; U.S. Pat. No. 6,251,639, incorporated herein by reference in its entirety). Once extension has begun, allowing the other primer to hybridize and initiate polymerization and strand displacement, RNA is usedThe property of enzyme H to degrade RNA in the RNA/DNA duplex removes the primer. In a preferred embodiment, the primer F site is designed in the A/T rich region of the adaptor to allow for frequent denaturation of double stranded DNA and allow the F primer to bind at the optimal temperature of the DNA polymerase of choice. approximately 100 to 1000 copies of an ampliot were generated by successive exponential amplifications without thermal cycling.

Another embodiment of the invention incorporates the T7 promoter into the linker and synthesizes RNA as an intermediate (FIG. 4). Double-stranded DNA is first generated on the slide surface using nick translation or strand displacement polymerases. The newly formed strand was used as a template for T7 polymerase, and the necessary double stranded promoter was also formed by extension from primer B. Transcription from the promoter produces an RNA strand that can hybridize to a nearby surface-bound primer, which in turn can be reverse transcribed using a reverse transcriptase enzyme. The linear amplification method can produce 100-1000 copies of the target. The resulting cDNA can then be converted to single-stranded DNA by degrading the RNA strand in the RNA/DNA duplex with rnase H or by alkali and heat treatment. In order to minimize intramolecular hybridization of primer B sequences in the RNA molecule, half of the primer B sequence can be derived from the T7 promoter sequence, thus reducing the amount of complementary sequence generated to about 10 bases.

Both amplification methods were isothermal to ensure limited diffusion of the synthesized strands within the ampliot. The ampliot size is about 2 microns, but it can be as large as 10 microns because the amplified DNA signal can compensate for a 25-fold increase in total surface background per CCD pixel. Furthermore, primer B attachment sites are approximately 10 nanometers apart (10,000/micrometer)²) Separately, this provides immediate capture of the replacement DNA. The closed capillary reaction chamber almost eliminates buffer turbulence.

Another embodiment of the invention provides a method for isothermal amplification with a strand displacing enzyme that forms a single-stranded DNA for primer annealing based on an invader oligonucleotide (see FIG. 5). Double-stranded DNA can be amplified using two primers at constant temperature, an invader oligonucleotide or other reagent, and a strand-displacing polymerase such as Klenow fragment polymerase. The invader oligonucleotide is at a concentration equal to or higher than the corresponding primer concentration. Initially, the target DNA is about 100 to 1 hundred million times lower than the primer concentration. The isothermal amplification method with invader oligonucleotides comprises the following steps:

1) invader nucleotides (which may be prepared from LNA or PNA or other modifier moieties that provide strong binding to DNA) are bound to one of the 5' end sequences of the target DNA by an invader method. The invasive nucleotide may be a single-stranded or double-stranded overhang (Ds). Low duplex stability of (TA) x or similar sequences that can be added to the ends of the corresponding target DNA by linkers can aid invasion.

2) Primer 1 hybridizes to an available single-stranded DNA site and initiates primer extension and polymerase replacement of one DNA strand. The invader nucleotide is partially complementary to primer 1. To avoid complete blocking of the primers, the size and binding efficiency of the complementary portion at the temperature and concentration used were designed to provide an equilibrium ratio of binding/non-binding of about 9: 1. About 10% of the free primer 1 is in excess of the amount of target DNA.

3) Primer 2 is hybridized to the other end of the single-stranded DNA, and a new double-stranded DNA is generated using polymerase.

4) Steps 1-3 are repeated as the starting and new dsDNA molecules successively prime steps 1-3.

E. Probe and library design

One or more detectable colors may be used; however, multiple colors can reduce the number of connection cycles and increase system efficiency. The state of the art suggests that four colors can be used simultaneously. Preferred embodiments of the present invention use FRET-based systems, time-resolved FRET signaling systems (Didenko, 2001, supra). The use of conventional chemical methods, such as quantum dot enhanced triple FRET systems, and dendrimer techniques are also contemplated.

In a preferred embodiment, two sets of universal probes based on FRET detection are used. All 4096 possible hexamers were generated with 1024 or fewer single syntheses, using the probe design described previously in commonly owned U.S. patent applications 09/479,608 and 10/608,298 (incorporated herein by reference in their entirety). The probes were subjected to screening (modeling) and QC (quality control) processing protocols (calidagenomics, Sunnyvale, CA) before being used in the experiments. Probes were designed to minimize the difference in efficiency, and the actual performance of each probe with either a perfect match or a mismatch target was determined by QC assay and used by the advanced base calling system (calilidogenomics).

6. Core technology

The method of the invention relies on three core technologies: 1) a universal probe that allows sequencing to be accomplished by hybridizing DNA from any organism and detecting any possible sequence changes. These probes are designed using statistical principles and do not refer to known gene sequences (see co-owned, co-pending U.S. patent application 10/608,293, incorporated herein by reference in its entirety); 2) combinatorial ligation, in which two universal panel short probes are combined to generate thousands of long probe sequences with excellent specificity by "enzymatic proofreading" of DNA ligase (see U.S. patent application 10/608,293); 3) libraries of informative probes (IPPs), a mixture of hundreds of probes labeled identically but with different sequences, simplify the hybridization process without adversely affecting sequence determination (see U.S. patent application 09/479,608, incorporated herein by reference in its entirety).

The method of the invention uses millions of single molecule DNA fragments randomly arrayed on an optically transparent surface as templates for hybridizing/ligating pairs of fluorescently labeled probes from IPPs. Millions of individual hybridization/ligation events across the entire array were detected simultaneously with a sensitive megapixel CCD camera with advanced optics (fig. 6). The DNA fragments (25 to 1500bp in length) were arranged in an array at a density of about 1 molecule per CCD pixel (1 to 10 molecules per square micron of substrate). Each CCD pixel defines a virtual reaction cell of about 0.3 to 1 micron containing one (or a few) DNA fragment and hundreds of labeled probe molecules. The ability of SBH to analyze sample mixtures and assemble the contained fragments is of great benefit for random arrays. The DNA density can be adjusted to 1-3 fragments, which can be efficiently analyzed in more than 90% of all pixels. The volume of each reaction is about 1-10 femtoliters. A 3x3 mm array has the ability to accommodate 1 billion fragments or about 1 billion DNA bases (equivalent to 30 human genomes).

7. Combined SBH

As noted above, standard SBH has significant advantages over competitive gel-based sequencing techniques, including improvements in sample read length. However, standard SBH methods are ultimately limited by the need to use exponentially larger probe sets to sequence longer and longer DNA targets.

The combined SBH overcomes many of the limitations of standard XBH techniques. In combination SBH (U.S. patent 6,401,267 to Drmanac, incorporated herein by reference in its entirety), two complete, universal short probe sets are contacted with target DNA in the presence of DNA ligase. Typically, one set of probes is attached to a solid support, such as a glass slide, while the other set of fluorophore-labeled probes is free in solution (FIGS. 6 and 7). When the attached and labeled probes hybridize to the target in precise proximity, they are ligated, resulting in a long labeled probe covalently attached to the surface. After washing to remove target and unattached probes, fluorescence signals at each array position are scored using a standard array reader. At a given position, a positive signal indicates the presence of a sequence within the target that is complementary to both probes, which combine to produce a signal. Combined SBH has tremendous read length, cost and material advantages over standard SBH methods. For example, in standard SBH, accurate sequencing of target DNA of 10-100kb in length (to find mutations) requires a full set of probes of more than one million 10-mer probes. In contrast, the same set of 10-mer probes was generated by combining two subgroups of 1024 5-mer probes with combinatorial SBH. The combination SBH achieves dramatic improvements in DNA read length and sequencing efficiency by greatly reducing experimental complexity, cost and material requirements.

8. Information probe library

The efficiency of combining SBH is further enhanced by using informative probe libraries (IPPs). IPPs are statistically selected probe sets that are pooled during hybridization to minimize the number of combinations that must be tested. A set of IPPs containing 4 to 64 different pools was designed to unambiguously identify any given target sequence. Each pool set contained one universal probe set. The size of the library is typically 16 to 256 probes. When one or more of these probes produces a positive signal, all probes in the pool receive a positive score. Scores from any independent IPP pairings are used to generate a combined probability score for each base position. Accurate sequence data is actually true because the scoring of each base position is generated by a combination of ten or more overlapping probes in different pools. False positive scores for one probe are easily corrected by many other correct scores from different pools. In addition, independent sequencing of complementary DNA strands minimizes the effect of library-associated false positive probes, since the chance of true positive probes for each complementary strand in a different library tends to decrease. The IPPs of the longer probes actually provide more information and provide more accurate data than the shorter probes scored alone. For example, for a 2kb DNA fragment, a16,000 64 mer pool provides 100 times fewer false positives than 16,000 individual 7-mers.

The set of IPPs will be used to obtain sequence information from the DNA targets of the array. IPPs are carefully selected pools of oligonucleotides of a given length, typically each pool containing 16 to 128 individual probes. All possible oligonucleotides of this length occur at least once in each set of IPPs. One set of IPPs is labeled with a donor fluorophore and the other set is labeled with an acceptor fluorophore. When ligation occurs between probes from the donor and acceptor sets, these act together to generate a FRET signal. This ligation event occurs only when two probes hybridize simultaneously to adjacent complementary sites on the target, thus identifying complementary sequences 8-10 bases long therein. The length of DNA fragments analyzable per pixel is a function of probe length, pool size and number of test probe pool pairs, and typically ranges from 20 to 1500 bp. By increasing the number of pools and/or probes, target DNA of several kilobases can be sequenced. Partial sequencing and/or characterization of 1-10kb DNA fragments can be accomplished with a small subset of IPPs or even a single probe pair. If multiple fluorescent labels are used, the IPP pairs can be tested in successive hybridization cycles or simultaneously. The fixed position of the CCD camera relative to the array ensures accurate tracking of successive hybridizations to individual target molecules.

IPPs are designed to facilitate strong FRET signals and sequence-specific ligation. A typical probe design includes a first set of IPPs as 5' -F_x-N_1-4-B_4-5OH-3 'and a second group of 5' -P-B_4-5-N_l-4-F_y-3', wherein F_xAnd F_yAre donor and acceptor fluorophores, B_nIs a specific (information) base, N_nAre degenerate (randomly mixed) bases. The presence of degenerate bases increases the effective probe length without increasing experimental complexity. Each probe set requires synthesis of 256 to 1024 probes, which are then mixed to create a pool of 16 or more probes per pool, for a total of 8 to 64IPPs per group. Individual probes may be present in one or more libraries as needed to maximize experimental sensitivity, flexibility, and redundancy. Pool hybridization to array from donor set was sequentially hybridized to pool from recipient set in the presence of DNA ligase. Once the pool from the donor set is paired with the pool from the recipient set, all possible combinations of 8-10 base information sequences are recorded, thereby identifying the complementary sequence within the target molecule on each pixel. The ability of this technique is to create and record long sequence strings, which can be millions, using two small sets of synthetic oligonucleotide probe combinations.

The precise biochemical reaction of this method relies on sequence specific hybridization and enzymatic ligation of two short oligonucleotides using a single DNA target molecule as a template. Although only a single target molecule is probed at any time, several hundred probe molecules of the same sequence are available for each target to perform rapid serial probing, providing a statistically significant assay. The combination of enzymatic efficiency during ligation and optimized reaction conditions provides rapid multiplex probing of the same single target molecule. At relatively high probe concentrations and high reaction temperatures, individual probes hybridize rapidly (within 2 seconds), but dissociate more rapidly (about 0.5 seconds) unless they are ligated. In addition, the time for the ligated probe to remain hybridized to the target at the optimized temperature is about 4 seconds, continuously generating a FRET signal that can be detected by a CCD camera. By monitoring each pixel for 60 seconds at 1-10 image frames per second, an average of 10 consecutive ligation events occurred on the paired target sequence, producing an optical signal of approximately 40 seconds in 60 seconds at this location. In the case of mismatched targets, ligation efficiency is about 30-fold lower, thus few ligation events occur, with little or no signal generated in 60 seconds of reaction time.

The main monitoring challenge is the minimization of background signal, which can be generated by the required excess of labeled probe molecules. In addition to focusing the CCD pixels on the smallest possible substrate area, our basic solution to this problem relies on a synergistic combination of surface proximity and FRET techniques (fig. 7). Prolonged excitation of the reporter label on one probe will only occur when a pair of probes are aligned on the same target molecule in close proximity to the illumination surface (e.g., a 100nm wide evanescent field resulting from total internal reflection). Thus, background signal does not arise from excess unhybridized probe in solution because either the donor is too far from the illumination surface or the acceptor is too far from the donor to cause energy transfer. In addition, probe molecules can be labeled with a variety of dye molecules (linked by branched dendrimers), increasing probe signal over the general system background.

After all IPPs have been tested, sequence assembly of the individual molecules was performed using SBH algorithms and software (commonly owned, co-pending U.S. patent application 09/874,772; Drmanac et al, science 260: 1649-. These advanced statistical methods define the sequence that matches the linkage data with the highest probability. The brightness measured with the CCD camera is processed as the probability that the full-match sequence of a given probe pair is present at that pixel/target site. Because several positive overlapping probes from different pools "read" each base independently with the correct sequence (FIG. 8), the binding probability of these probes provides an accurate base determination even if several probes fail. In addition, if a plurality of independent probes corresponding to incorrect sequences do not hybridize to the target, the probability of binding of the sequences is considered low. This occurs even if several probes corresponding to incorrect sequences show positivity, since they happen to be present in IPPs with true positive probes that match the true sequences.

rSBH process

The core of the rSBH method of the present invention involves the creation and analysis of high-density random arrays containing millions of genomic DNA fragments. This random array eliminates the costly, time-consuming step of aligning probes on the substrate surface, and also eliminates the need to individually prepare thousands of sequencing templates. Instead, they provide a fast and cost-effective way to analyze complex mixtures of DNA containing 10Mb to 10Gb in a single assay.

The rSBH method of the present invention combines the following advantages: 1) ligation of the combinatorial probes of two IPPs in solution generates a sequence-specific FRET signal; 2) the accuracy, long read length and ability of the combinatorial approach to analyze DNA mixtures in an assay; 3) TIRM, a high sensitivity, low background fluorescence detection method; 4) commercially available megapixel CCD cameras with single photon sensitivity. The method of the invention provides the ability to detect ligation events on a single target molecule because long-term signals are only generated when two ligation probes hybridize to an attached target, bringing the donor and acceptor fluorophores within 6-8 nanometers of each other, and creating an evanescent field on the array surface that is 500 nanometers wide.

The method of the present invention typically uses several thousands to millions of single molecule DNA fragments randomly arrayed on an optically transparent surface as templates for hybridizing/ligating pairs of fluorescently labeled probes from IPPs (FIG. 6). Pairs of probe pools labeled with donor and acceptor fluorophores are mixed with DNA ligase and supplied to a random array. When the probes hybridize to adjacent sites on the target fragment, they are ligated together, generating a FRET signal. Sensitive megapixel CCD cameras with advanced optics are used to simultaneously detect millions of individual hybridization/ligation events across the entire array. Each pairing sequence may result in several independent hybridization/ligation events, as the ligated probe pair eventually diffuses away from the target and is replaced by newly hybridized donor and acceptor probes. Unligated pairs that hybridize close to each other can transiently generate FRET signals, but do not remain bound to the target long enough to generate significant signals.

Once the signal from the first library is detected, the probes are removed and different probe combinations are tested with successive ligation cycles. The fixed position of the CCD camera relative to the array ensures accurate tracking of 256 IPP pairs (16x16IPPs) for successive tests, and takes 2-8 hours. The entire sequence of each DNA fragment was compiled from the fluorescent signals generated by hundreds of independent hybridization/ligation events.

DNA fragments (15-1500 bp in length) were arranged at a density of approximately 1 molecule per square micron of substrate. Each CCD pixel defines a virtual reaction cell of about 1x1 to 3x3 microns, containing one (or a few) DNA fragments and hundreds of labeled probe molecules. The method of the invention effectively exploits the ability of SBH to analyze sample mixtures and assemble the sequence of individual fragments in the mixture. The volume of each reaction is about 1-10 femtoliters. A3 x3 mm array has the capacity to accommodate 100-1000 million fragments or about 10-100 million DNA bases, with the upper limit being the equivalent of 3 human genomes.

The length of DNA fragments analyzable per pixel is a function of probe length, pool size and number of test probe pool pairs, and typically ranges from 50 to 1500 bp. By increasing the number of pools and/or probes, target DNA of several kilobases can be sequenced. Partial sequencing and/or characterization of 1-10kb DNA fragments can be accomplished with a small subset of IPPs or even a single probe pair.

The rSBH method of the present invention retains all the advantages of combinatorial SBH, including the high specificity of the ligation method. At the same time, it adds several important benefits, which come from the attachment of DNA fragments rather than probes. DNA attachment creates the possibility of using random DNA arrays with larger capacities than conventional probe arrays and FRET detection by ligation of two labeled probes in solution. Furthermore, having two probe assemblies in solution allows extension of the IPP strategy to two probe sets, which is not possible in conventional combinatorial SBH.

10. Method step

The rSBH whole-sample analysis included the following processing steps, which could be integrated into a single microfluidic chip (fig. 9):

1) simple sample handling or DNA isolation (if required), including an efficient method of collecting pathogen DNA on a column of pathogen mixture;

2) random DNA fragmentation to generate targets of appropriate length

3) Direct ends of the DNA are attached to the active substrate surface, for example by attachment to a universal anchor;

4) array washing to remove all unbound DNA and other molecules present in the sample;

5) introducing a first IPP pair from two IPP groups at an appropriate probe concentration, along with T4 ligase and some other (i.e. thermostable) DNA ligase;

6) the illumination was incubated with less than 1 minute and signal monitoring was performed at 1-10 frames per second.

7) Washing to remove the first IPP pair, and then introducing the second IPP pair; and

8) after all IPP pairs have been tested, the computer program will generate the features or sequences of each fragment, then compare them to a comprehensive database of features or sequences, and report the nature of the DNA present in the sample.

11. Device size and characteristics

The apparatus used in the process of the present invention is based on the description in commonly owned, co-pending U.S. patent application 10/738,108, which is incorporated herein by reference in its entirety. The device of the invention comprises three main components: 1) a manipulation subsystem to manipulate (mix, introduce, remove) IPPs, allowing for the extended introduction of this module into "on-chip" sample preparation, 2) a reaction chamber-a temperature controlled flow-through chamber capable of holding any substrate, and 3) an illumination/detection subsystem (fig. 10). These subsystems work together to provide single fluorophore detection sensitivity.

The apparatus of the invention operates a plug-in reaction chamber with a well for placement of array substrates and a port for connection of a probe module, also for connection of a possible array preparation module if DNA attachment and/or in situ amplification is done in the chamber.

The cassette comprises up to 64 separate reservoirs for up to 32 FRET donors and up to 32 FRET acceptor pools (fig. 11). The cartridge includes a mixing chamber connected to the reservoirs of each reservoir via a single microfluidic channel and an integral vacuum/pressure actuated microvalve.

11.1 reaction Chamber

Once the substrate is attached to the reaction chamber, a bottom surface of the hybridization chamber is formed. The chamber controls the hybridization temperature, provides a port for adding a pool of probes to the chamber, removes the pool of probes from the evanescent zone, redistributes the pool of probes throughout the chamber, and washes the substrate. The labeled probe pool solution is introduced into the reaction chamber and hybridized (seconds) with the target DNA at a given time. Probes not participating in the hybridization event are pulled from the evanescent zone by establishing a voltage potential in the hybridization solution. The FRET hybridization/ligation event is detected by monitoring the substrate through a window in the top of the reaction chamber using a high sensitivity CCD camera capable of detecting single photons (Ha, Methods25:78-86(2001), incorporated herein by reference in its entirety). Images of the substrate were taken at regular intervals of about 30 seconds. The chamber is then rinsed to remove all probes and the next probe pool is introduced. This process was repeated 256-512 times until all probe pools were assayed.

11.2 illumination subsystem

The illumination subsystem is based on a TIRM background reduction model. TIRM establishes an evanescent zone 100-500 nm thick at the interface of two materials with different optical properties (Tokunga et al, biochem. Biophys. Res. Commun.235:47-53(1997), incorporated herein by reference in its entirety). The illumination method used by the device of the invention eliminates any effect of the gaussian distribution of the light beam in the measurement. The laser and all other components in this subsystem of the apparatus of the present invention are mounted to the optical bench. A 1cm scan line is established by moving mirrors mounted on galvanometers 1 and 2 (fig. 10). The scan line is then directed into the substrate through prism 1 using galvanometer 3. The galvanometer 3 is adjusted so that the scan line crosses the glass/water interface at its critical angle. The beam undergoes total internal reflection to create an evanescent field on the substrate. The evanescent zone is an extension of the beam energy by a few hundred nanometers (typically 100-500 nanometers) beyond the glass/water interface.

11.3 detection subsystem

The device of the invention uses a high sensitivity CCD camera capable of photon counting (e.g., DV887, 512x512 pixels from Andor technology (Hartford, CT)) suspended above the hybridization chamber. The camera monitors the substrate through a window of the reaction chamber. The lens of the camera provides sufficient magnification to allow each pixel to receive light from a 3 micron square substrate. In another embodiment, the camera may be water-cooled for low noise applications.

High sensitivity electron multiplying ccd (emccd) detectors enable high speed detection of single fluorophores. Assuming a1 watt excitation laser at 532nm (for Cy3/Cy5FRET), the number of photons emitted by the laser per second can be counted and the number of photons reaching the detector per second can be estimated. With the equation e = hc/λ, where λ represents the wavelength, a photon of 532nm wavelength has an energy of 3.73e-19 joules. Assuming that the laser output power is1 watt, or 1 joule/second, 2.68e18 photons per second are expected to be emitted from the laser. This amount of energy is spread across 1cm²Is expected to receive an energy of about 1e-15 joules, or about 26,800 photons, per square nanometer. Assuming a quantum yield of 0.5 for the fluorophore, about 13,400 photons per second are expected to be output. With a high precision lens, it should be possible to capture about 25% of the total output collected or 3350 photons in total with a CCD. The DV887CCD of Andor has a quantum efficiency of about 0.45 at 670-700 nm, where Cy5 emits. Approximately 1500 photons per second are recorded by each pixel. At 10 frames per second, 150 numbers are recorded per frame. At-75 ℃, the camera's dark current is about 0.001 electrons/pixel/second, with an average of 1 false positive count per 1000 pixels in a second. Even assuming 1 false positive count per pixel per second, each frame0.1 per pixel, a signal to noise ratio of 1500:1 is also obtained. In conjunction with the TIRM illumination technique, the background of the detector is virtually zero.

11.4 miniaturization of the device

In another embodiment, the method of the present invention may be performed in a microdevice. A simple physical device requiring only a few off-the-shelf components can perform the entire process. The illumination and detection assembly forms the core of the system. The core system consists of only one CCD camera, one laser or other light source, 0 to 3 scanning galvanometers, quartz or equivalent support for the substrate and one reaction chamber. It is possible to place all of these components in a1 cubic foot installation. Microfluidic-operated robotic or microfluidic chip laboratory devices (FIG. 9) will perform this assay by accessing pairs of IPPs from two pools of 8 to 64IPPs, which may occupy about 0.5 feet³. High density multi-well plates or lab-on-a-chip with 64 reservoirs would be available for ultra-small storage of the library. A single board computer or laptop can run the device and perform the analysis. The system is easily transported and can be loaded into virtually any vehicle for field environmental investigations or to react to emergency teams or biohazard workers.

The components of the system include: 1) a microcomputer (1 foot x1 foot x6 inch), 2) a robotic or lab-on-a-chip fluid handling system (1 foot x1 foot x2 inch), 3) a laser (6 inch cube), 4) a scanning galvanometer with a heat sink (3 inch cube), 5) a slide/hybridization chamber assembly (3 inch x1 inch x2 inch), 6) a CCD camera (4 inch x4 inch x7 inch) and 7) a fluid reservoir (about 10-1000 ml capacity).

Another embodiment of the device of the present invention integrates a cartridge-based microfluidic substrate on which all assays for pathogen detection are performed (FIG. 11). The consumable substrate is in the form of an integrated "reaction cartridge". The substrate component of the reaction cassette must accept three different kinds of disposable integrated modules, including: the device comprises a probe base module, a sample integration module and a reaction substrate module. All machine functions are applied to this cartridge to produce the assay results. The substrate requires integrated fluids, such as providing quick connection of the reaction cartridge and associated modules.

Microfluidics are introduced into a substrate to process a library of information probes on a detection surface of the substrate. Using a modular approach, where the initial probe manipulation module developed is substrate independent, the final design can be added to a standard substrate reaction cartridge using a "plug and play" approach. The reaction cassette comprises up to 64 separate reservoirs for up to 32 FRET donor pools and up to 32 FRET acceptor pools (see fig. 11). A larger number of IPPs may be stored on one or a set of reaction cartridges, e.g., 2x64, or 2x128, or 2x256, or 2x512, or 2x1024 IPPs. The reaction cartridge has a mixing chamber connected to a main channel by its own microfluidic channel and an integral vacuum/pressure actuated microvalve. When the valve is opened, vacuum is applied to move the reservoir into the mixing chamber. The valve is then closed and the process is repeated, adding a second reservoir. The mixing chamber is in line with a wash pump which is used to agitate the reservoirs and push them into the reaction chamber.

12. Software components and algorithms

The row data represents about 3-30 intensity values per pixel for each pair of bins (i.e., IPPs) at different time/temperature points. The values are obtained by statistically processing 10-100CCD measurements (preferably 5-10 per second). Each segment has 512 sets of 3-30 intensity values. An array having one million segments contains about one hundred million intensity values. Signal normalization can be performed on groups of hundreds of pixels. If the group does not meet the expected characteristics, all data points for a given IPP pair will be discarded. Individual pixels (most of which have appropriate DNA) with no useful data (i.e., not enough positive or negative data points) will be discarded. The distribution of intensity values among other pixels is determined and used to adjust base call parameters.

All individual short fragments can be mapped as a corresponding reference sequence using the recorded features, analyzed using comparative sequencing methods or sequence assembly using de novo SBH functions. Starting with the primer sequence, fragments of about 250 bases each are assembled from about one million possible 10-mers. The assembly process was performed by evaluating the combined 10-mer records, calculated from overlapping 10-mers of millions of local candidate sequence variants.

One set of fragments from one array region represents a long contiguous genomic fragment with significant overlapping sequences with several sets of fragments from other array regions. These groups can also be identified by alignment of short fragment sequences to a reference sequence, or as a DNA island comprising pixels surrounded by empty pixels. Directing short segments into groups is an interesting algorithmic problem, especially in arrays of partial structures.

The short fragments within a group are from one fragmented single DNA molecule and do not overlap. However, short sequences will overlap between the respective groups, representing normal, overlapping DNA fragments, and the assembly of long fragments is performed by the same method as the assembly of sequences from cosmids or BAC clones in the shotgun sequencing method. Because the long genomic fragments in the rSBH method vary over the 5-100kb range, representing 5-50 genomic equivalents, mapping information is provided at all relevant levels to guide accurate contig assembly. This approach can tolerate omissions and errors in the assignment of small segments into long segments and about 30-50% of randomly missing segments in a single group.

The rSBH method of the present invention provides detection of rare organisms, or quantification of cell numbers or gene expression of each microorganism. When the dominant species has 1x genome coverage, about 10 genome segments represent that species present at a 0.1% level. DNA normalization can further improve detection sensitivity to one cell out of more than 10,000 cells. DNA quantification is accomplished by counting the number of DNA fragments present that represent a gene or an organism. The absence of cloning steps means that rSBH should provide a more quantitative estimate of the incidence of each DNA sequence type than conventional sequencing. For quantification studies, it is sufficient to fragment the sample directly to 250bp fragments, forming a standard (unstructured) random array. Partial normalization can be used to minimize the difference, but the difference still exists; the raw frequency can be calculated using a normalized curve. An array of one million fragments is sufficient to quantify hundreds of gene species and their gene expression.

12.1rSBH software

The present invention provides software that supports sequencing of the rSBH whole genome (complex DNA sample). The software can be scaled up to analyze the entire human genome (. about.3 Gb) or a mixture of genomes up to 10 Gb. Parallel computations over several CPUs are considered.

The rSBH device can generate a set of tiff images at speeds up to 10 a/sec or faster. Each image represents the hybridization of the target to the pooled pair of labeled primers. Multiple images can be generated for each hybridization to determine the signal average. The target fragment is fragmented into multiple fragments of about 100-500 bases in length. These fragments are attached to the surface of a glass substrate in a random distribution. After hybridization and washing of the unhybridized probe, an image of the surface is taken with a CCD camera. Finally, each pixel of the image may contain one segment, although some pixels may be empty, while others may contain two or more segments. The device may image 100-.

The total device run time is determined by the hybridization/wash/imaging cycle (-1 minute) times the number of library sets used. With a 1024 library set (resulting in 1024 images), the run will last about 17 hours; two colors can reduce the process by half. The image analysis software processes the image in near real-time and sends the data to the base call analysis software.

A. Parallel processing

rSBH analysis is ideally suited for parallel processing. Because each "spot" hybridizes to a different fragment, the base call analysis can be performed in parallel at each spot without communication between the analyses. The only communication throughout the analysis is between the control module (GUI) and the analysis program. To avoid race conditions, very minor steps are taken. In practice, the number of CPUs limits the number of parallel processes. For a million fragments, a computer with 100 processors will divide the job into 100 parallel base calls, each analyzing 10,000 or more fragments in succession.

A 200 fragment group may run on one processor, however it may also run on several CPUs. If there is no mutation or mutation test (upgrade function), the optimized base calling procedure can be completed in 100 milliseconds. This time includes data loading and normalization. For the longest reference sequence (see below), a reference seek time of 100 milliseconds can be added. The reference sequence search time is proportional to its length and is negligible for short lengths. Analysis of multiple mutations can extend the run time to about 1 minute per multiple mutation site. If the average analysis time is1 second per segment, one million segments can be analyzed in 10,000 seconds with 100 CPUs. Similarly, 200 fragments can be analyzed in 200 seconds with one CPU or in 20 seconds with 10 CPUs. Optimizing program speed requires a large amount of RAM per CPU. As described below, if each CPU has 2GB to 8GB, the software is not memory bound, depending on the number of CPUs and the number of segments. Currently, it is possible to purchase more than 32GB of RAM for each system.

B. Data flow

The GUI and image analysis programs are run on one CPU, while the base call analysis program is run on several (N) CPUs. At start-up, the image analysis program is loaded with the number N, monitoring the directory of tiff images written by the CCD camera. For each tiff file, it takes the record of each segment and groups it into N files, one for each analysis CPU. For example, if there are 200 fragments and 10 CPUs, the image analysis program writes the first 20 fragment records into one file, calls the analysis CPU process with the first base, writes the second 20 fragment records into the second file, calls the analysis CPU process with the second base, and so on. Other communication means are also contemplated, such as a jack or MPI. Thus, the input/output of a file can be localized to a module so that it can be easily swapped out later.

Over time, a large number of image analysis files are generated for a continuously increasing number of tiff files. The present invention provides a separate image analysis catalog for each base call analysis CPU. The base call analysis CPUs each monitor their respective image analysis catalog and load the data once it is available. The amount of RAM/CPU necessary to store all image data is [2 bytes x the number of segments x the number of images ÷ N ]. For 1 million fragments, 1024 images and 1 CPU, this is 2 GB/CPU; or for 10 CPUs this is 200 KB/CPU.

The other large amount (in terms of RAM) of data input to the base call analysis program is the reference sequence (length L). To optimize speed, the reference was converted to a vector of 10-mer (and 11-mer, 12-mer) positions, providing the highest recording probe for each fragment to be quickly found (see below). This is the fastest way to store reference sequence position data in the analysis CPU per base call. The amount of memory required to store the reference sequence position data is 2 bytes xL or 2 bytes x4¹²The larger of these. The largest RAM is 2 bytes x10 GB-20 GB. The actual reference itself must also be stored, but this can be stored at 1 byte/base, even compressed to 0.22 byte/base.

Analysis of each fragment yields a calling sequence result. These results are concatenated to the file written into the image analysis directory associated with each CPU. When the base call is complete, the GUI processes the called sequence file. It loads all files from different CPUs and rearranges the position of the fragments, resulting in a final complete call sequence. Note that this rearrangement is not important, as the fragments have already been located in the previous reference sequence search step. The GUI may also provide a visualization tool for the sequence of calls. In addition, the GUI may display an intensity map of the final sequence. In this case, the base call program must also output an intensity file (concatenated into call sequence data).

Existing base calling programs output a reference-and-spot-based record (e.g., from HyChip)^TM) The short report file of (1). This may not be useful for rSBH because the spots of each fragment are distributed in many hybridization slides. Instead, a new "short report" was generated for each hybridization, which was much more advanced than the HyChip short reportBriefly described. Specifically, the new report may list the number of full matches (N) on each slide and the median of the highest N records. It may also give a median of any control spots, such as standard reference or empty spots, if present. The advantage of this new report is that the individual images can be viewed in real time on a frequently upgraded GUI desktop. This will tell the user at an early stage (and all the way through the run) if the rSBH system is producing useful data, rather than waiting a day to see the final result. A high level use of this new report is to allow the user to feed back to the rSBH device. For example, pausing/stopping the run from the GUI, or if any one library set fails, repeating the run of one library set. The GUI may also display device parameters such as hybridization and wash temperatures in real time during operation. Finally, the artifact may integrate the device into a command and control module of the user GUI.

C. Base calling

Since the pool of pooled probes was the same for each fragment, the rSBH base call program could read all fragments in the pooled probe only once. The base calling procedure requires the input of a reference sequence. For rSBH, the reference is from the analysis of the top several hundred record clusters. The simple bin (binning) algorithm for the highest recording position is most efficient because it requires finding the maximum bin count in one pass through the bin position. The window of maximum bin counts locates the fragment position in the reference sequence. Using the 250bp fragment and 1024 assays, the 1/4 fragment record was positive (i.e., a full match hybridization record). Then, due to the complexity of the pooled probes, the 10-mer of 1/4 represents a positive record. And, for lengths longer than 4¹⁰The probe is repeated so that 1/4 is positive for all 10-mers in the reference sequence. For the same application for 11-mers and 12-mers, 1/4 was positive for all reference probes. For a processor that can place a probe into a storage bin for 1 ns, [ L/4/10 ] is required to find the reference sequence for a fragment⁹]And second. For the extreme value L of 10,000,000,000, 2.5 seconds per segment are required with one CPU. The total time to find the reference sequence was 25,000 seconds (6-8 hours) for one million fragments and 100 CPUs.

An alternative to placing the highest L-record in the bin is to perform a de novo sequence assembly of the individual fragments, reducing the number of probes used in the above example to well below 250. If the de novo algorithm is fast (e.g., less than 1 millisecond), it will speed up the segment finding process. A fast de novo algorithm may involve finding sets of 10 or more up to 250 records with overlapping probes and may reduce the time required by an order of magnitude or more.

D. Base calling algorithm

1. Reading probe library files

2. The reference sequence (length RL) is read and stored in the reference sequence object.

2a. generating a reference sequence position data structure.

3. Read luminance files (in real time, as they are generated by image analysis).

3a. store the value in a storage data structure.

4. Accumulate the highest L records (median length L) for each segment

5. Analysis Ring of Each fragment

Establishing a list of positions in a reference sequence for the highest L record

5b. create a vector of length [ RL ÷ (m × L) ] and place the highest L record position in the bin.

This results in a memory length of m L, where m should be 1.5 to provide a boundary on either side of the fragment.

Place the location of the highest L record into the bin vector.

And 5d, finding the area with the highest total storage count. This gives a fragment reference sequence to within (m-1). times.L base positions.

Base calling with fragment reference sequence.

Concatenating the call sequence into a file: referred to as a "sequence" (including location information).

13. Additional embodiments

The methods of the invention allow for the design of probes and IPPs by a variety of mechanisms. In one embodiment, the probes and IPPs are designed by varying the number of probes per pool, more specifically, ranging from 4 to 4096 probes per pool. In a second embodiment, probes and IPPs are designed by varying the number of pools per group, more specifically, 4 to 1024 probes per group. Probes may contain 2 to 8 information bases, providing a total of 4-16 bases. In another embodiment, the probes are prepared as a library using degenerate synthesis at some positions. Another embodiment comprises two assemblies of two sets of IPPs in which different probes are mixed in a single pool.

A small set of 20 to several hundred probes can provide unique hybridization characteristics of individual nucleic acid fragments. The hybridization pattern is paired with sequences to identify the pathogen or any other nucleic acid, e.g., enumerate mRNA molecules. One embodiment of the method of the invention uses features to identify the same molecule on different random arrays. This can result in a feature where different subsets of test probes on different arrays prepared from the same sample hybridize, then bind the data for each individual molecule, upon hybridization of the same probe set on the different arrays.

Another embodiment of the method of the present invention performs single molecule DNA analysis using only a single set of IPPs or a single probe without combinatorial ligation. In this embodiment, the FRET signal is detected by labeling the target and the probe with the acceptor fluorophore with the donor fluorophore, or labeling the target and the probe with the acceptor fluorophore with the donor fluorophore. The synthesis of 5' -N can be carried out individually or as a library containing degenerate bases (mixtures) at specific positions_x-B_4-16-N_y-a 3' form of the probe. In another embodiment, polymerase-based hybridization probe extensions hybridize to the probe/probe pool by incorporating one or more labeled nucleotides, which are typically differentially labeled.

Another embodiment of the method of the invention utilizes probe removal, so that multiple tests of one target molecule are performed with the same probe sequence, which can be repeated with an electric field, a magnetic field or a stream of solution from and to the surface of the support. The cycle occurs from every 1-10 seconds to 20-30 seconds. The fluorescence signal of each phase in the cycle is recorded only after the probe removal has begun or only after the probe removal has been completed. Removal coupled with temperature cycling. In this embodiment, probe removal does not require a FRET label, but relies on direct fluorescence from one label. In addition, a FRET reaction occurs between the labeled probe and the dye molecule attached to the target molecule.

The method of the present invention involves another embodiment of repeating the test probe sequence by repeatedly loading the same probe into the reaction chamber from outside the container. The previous probe load is then quickly removed first with a wash buffer that does not remove the full match hybrid (the ligation product of the two probes if ligation is used) but removes the free probes. All hybrids were melted with a second wash before subsequent probe load introduction.

In another embodiment, the interaction of each probe with the target molecule is measured only once. The process relies on redundant representations of the same DNA segment at different locations within the array, and/or on the accuracy of a ligation event.

In addition to preparing the final fragments prior to loading on a support to form an array, in another embodiment of the method of the invention a two-stage cleavage procedure is used. The sample DNA is first randomly cleaved to form longer fragments (about 2-200kb or longer). The mixture of fragments is loaded onto a support, which may be composed of particles containing about 10X10 microns²A hydrophobic material in the form of a grid of large and small units. The sample concentration is adjusted so that one or several long fragments are mainly present in each unit. These fragments will be further randomly fragmented in situ, resulting in final fragments of about 20-2000 bases in length, and attached to the support surface. The optimal unit size depends on the total length of DNA introduced per unit and the preferred length of the final fragment. According to the inventionThis fragmentation method provides long-range mapping information because all short fragments in a unit belong to one or several long fragments from long overlapping fragments. This inference simplifies the assembly of long DNA sequences and can provide a full-chromosome haplotype structure.

In another embodiment of the invention, selected target DNA is captured from a complex sample using, for example, a column containing an equivalent number of DNA molecules for a particular gene or organism. For example, the selected viral or bacterial genome, or parts of it, may appear on these columns as attached single stranded dna (ssdna). If double-stranded DNA (dsDNA) and the complementary strand are captured by hybridization to the immobilized DNA, the sample DNA is melted. Excess complementary DNA or any other unrelated DNA is washed away. The captured DNA is then removed by high temperature or chemical denaturation. In the diagnosis of infectious agents, human or other complex DNA can be removed by this method. Methods of reducing the concentration of agents with hyperrepresentation are also provided to detect other agents present at low copy numbers on smaller arrays. The capture process can be performed in a tube, a well of a multi-well plate, or a microfluidic chip.

Selection of specific genes or other genomic fragments is accomplished by cleaving the DNA with restriction enzymes that have downstream cleavage and ligating matched adaptors (described in co-owned, co-pending U.S. patent application 10/608,293, incorporated herein by reference in its entirety). Fragments that are not captured by the adaptor will be destroyed or removed. Another embodiment uses 6-60 base nucleotides, or more preferably 10-40 bases, or even more preferably 15-30 bases, designed to pair with a given sequence with one or more mismatches, allowing cleavage of DNA with a cleaving enzyme using mismatch recognition, two oligonucleotides can be designed for cleavage of complementary strands with about 1-20 base moves, creating sticky ends that ligate linkers or ligate carrier arms. Two pairs of the oligonucleotide cleavage templates from the genomic fragment can be obtained and captured, or end-modified for capture with a specific adaptor. Synthesizing a cleavage template, or designing one or more short oligonucleotide libraries to provide a universal source of the necessary cleavage template for any DNA. A library of 256 oligonucleotides can be represented by the following consensus sequence: nnbbbnn, nnbbbbnnor cggnnbbbnnn, nnbbbnn, nnbbbnncac, where n represents a mixture of four bases or universal bases, b represents one specific base, bbbb represents one of 256 possible 4-mer sequences, cgg and cac represent examples of specific sequences shared by all members of the library, which can be used to create cleavage templates. To create the cleavage templates, two or three members selected from the corresponding nucleotide libraries can be ligated using assembly templates, nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnngtgg.

In addition to various chemical attachment methods, DNA fragments prepared by random or specific cleavage are attached to a surface using fragments attached to anchors, linkers, primers, other specific binders attached to the surface. One embodiment ligates ssDNA fragments or dsDNA fragments to paired sticky ends using a randomly attached anchor with sticky ends of about 1-10 bases in length. Linkers attached to the DNA fragments may provide cohesive ends. This method offers the possibility of using substrate-faced anchors with different sticky ends for identifying the terminal sequences of the attached fragments. Another embodiment attaches a primer to the support, which primer is complementary to the adaptor attached to the DNA fragment. After hybridization of the ssDNA to the primer, the primer is extended with a polymerase. The resulting dsDNA is melted and strands not attached to the support are removed for DNA amplification as described below. In another embodiment, the surface is covered with a specific binder (e.g., a cyclic peptide) that recognizes the 3 'or 5' ends of the DNA fragments and binds them with high affinity.

Analysis of short fragments with linkers attached on one or both sides can facilitate reading through palindrome and hairpin structures, since when there is a nick in the palindrome/hairpin structure, the new linker sequence will be ligated to sequences that are not complementary to the remaining sequences. The linker allows reading each base of the target DNA with all overlapping probes.

In another embodiment, by using a random array of single molecules followed by in situ, local amplification (Drmanac and Crkvenjakov, 1990, supra, incorporated herein by reference)For reference) increases the accuracy and efficiency of detection, producing up to 10, up to 100, up to 1000, up to 10,000 attached duplicate molecules within the same pixel area. In this case, single molecule sensitivity is not required, since multiple recordings of the probe are not necessary, even though FRET and TIRM are still available. The amplification process comprises the following steps: 1) one primer (about 1000-²) 2) sample DNA fragments modified with a linker and a second primer in solution. It is desirable to minimize mixing and diffusion, for example, by embedding the target and second primer in a gel with a capillary chamber (a cover slip with only 10-100 micron space from the support). The population of molecules resulting from the amplification of a single target molecule will form a spot, or "amplicon," which should be less than 10-100 microns in size. Amplification of hybridization or ligation events can also be used to increase signal.

One preferred embodiment uses sequential isothermal amplification (i.e., different types of strand displacement) because there is no need to denature dsDNA with high temperatures that can cause extensive diffusion or turbulence, and the displaced strands do not bind other complementary DNA except for the ligated primers, which can result in locally high concentrations of DNA. Another embodiment using isothermal amplification is to design at least one linker (to one end of the target DNA) with a core sequence with a low melting temperature (i.e., a TATATAT … sequence with 3-13 TA repeats) to which the primers are substantially paired. At the optimal temperature for the polymerase to perform the strand substitution used in this reaction, the dsDNA at TATATA … site will melt locally, allowing primer hybridization and initiation of a new replication cycle. The length (i.e. stability) of the core can be adjusted to accommodate temperatures between 30-80 ℃. In this sequential amplification reaction (CAR), synthesis of a new strand begins as soon as the enzyme that has previously synthesized moves from the primer site, taking on the order of seconds. Using this method, high concentrations of ssDNA can be generated starting from dsDNA using only one primer. For amplification where one primer is attached to the surface, the low temperature melting junction should be for the non-attached end, and the corresponding primer free in solution. The CAR does not require any other enzyme besides the polymerase. To perform two or more initial amplification cycles on source dsDNA that may require high temperature melting, adapters are introduced by ligation to DNA fragments or tail extension of target-specific primers.

The above-described nucleic acid analysis methods are based on probe/probe pool hybridization alone or in combination with base extension or random array of two probes attached to sample DNA fragments, and are useful in a variety of applications, including: sequencing of longer DNA (including Bacterial Artificial Chromosomes (BACs) or whole viruses, whole bacteria or other complex genomes) or mixtures of DNA; diagnostic sequence analysis of the selected gene; whole genome sequencing of newborn infants; agricultural biotechnology research to accurately understand genetic compositions of new crops and animals; monitoring of single cell expression; diagnosis of cancer; sequencing by DNA calculation; monitoring the environment; analyzing the food; and the discovery of new bacterial and viral organisms.

The method of the invention produces sufficient signal from a single labelled probe to reduce the background below the threshold for detection. Special substrate materials or coatings (e.g. metallizations) and advanced optical systems are used to reduce the high system background, which impedes the use of light from 1cm²Surface parallel detection of millions of single molecules. In addition, the background generated during sample introduction or during DNA attachment can be reduced by special handling of the sample, including affinity columns, modified DNA attachment chemistry (e.g., ligation), or the exclusion of DNA-specific binding molecules (e.g., cyclic peptides). In some cases, reducing the background generated by unbound probe complexes in solution or assemblies on a substrate requires cycling away unhybridized/ligated probes, which can be done by: an electric field pulse; specifically engineered ligases, enzymes having optimized thermostability and full match specificity; or a triple FRET system with a third dye (e.g., quantum dots) attached to the target molecule.

In another embodiment, the method of the invention requires the use of an electric field to increase the concentration of DNA molecules on the support in order to capture all fragments from the chromosome or genome on the surface of the random array. Fragmentation of chromosomes to allow for correct assembly may require compartmentalized substrates and in situ fragmentation which will initiate a single 100kb to 1Mb DNA fragment, resulting in a shorter linked set of 1-10kb fragments.

The resulting rapid hybridization/ligation allows multiple probing of the target with a pair of primer libraries in less than 60 seconds per cycle, possibly requiring the use of optimized buffers and/or active probe manipulation, possibly using electromagnetic fields. The chemical and physical stability of the system (including aligned target DNA molecules) is increased with precise control of the excitation properties of fluorescent dyes (or dendrimers) and illumination (nanosecond laser pulses) compatible with DNA stability to allow several hours of illumination.

Fast real-time image processing and assembly of individual fragments from overlapping probes and entire genomes from overlapping DNA fragments may require programmable logic arrays or multi-processor systems for high-speed computation.

The method of the invention relies on the following processes: the visible fluorescence signal is generated by the labeled probe and the DNase, so that the specific molecular recognition of the complementary DNA sequence is carried out. Through sequence recognition and enzymatic proofreading processes that rely on natural evolution, rSBH eliminates a significant technical challenge of physically distinguishing individual DNA bases, which are only 0.3 nanometers in size and differ from each other by only a few atoms. The method of the invention also provides for very simple sample preparation and handling, including random fragmentation of chromosomes or other DNA to form DNA molecules that are about one molecule (1-10 mm) per square micron²) Random single molecule arrays. The method of the invention simultaneously collects data from millions of single DNA fragments at high speed. With a 10 fluorescent color and 10 megapixel CCD camera, a single rSBH device can read 10 reads per second⁵A single base. The read length of the present invention is tunable, from about 20-20,000 bases per fragment, for a total of up to 1 billion bases per single experiment on a random array. By initial fragmentation of a single long fragment and ligation of the corresponding short fragment set to separate random sub-arrays, the effective read length of the rSBH method can be as high as 1 Mb. For each single DNA molecule tested, 100 independent detections were obtained per base (i.e., 10 overlapping probe sequences, averaging a continuum of 10 identical DNA molecules tested eachEvents), which ensures maximum sequencing accuracy.

PCR amplification of samples of several kilobases in length with a combination of SBH's of IPPs provides sequence data with an accuracy of over 99.9%. This read length is many times longer than that obtained with currently used gel-based methods and provides for whole gene sequencing in a single assay. The method of the invention combines the advantages of parallelism, accuracy and simplicity of hybridization-based DNA analysis with the efficiency of miniaturization and low material cost of single-molecule DNA analysis. The use of a universal probe set, a combination of ligation and informative probe libraries allows for efficient and accurate analysis of any and all DNA molecules and detection of any sequence changes therein with a single panel of oligonucleotide probe libraries. The method of the invention uses an integrated system employing well-known biochemistry and informatics to process ultra-high density, random single molecule arrays to achieve sequencing throughput that is significantly 1,000 to 10,000 times higher than existing gel and SBH sequencing methods. The method of the invention will allow sequencing of all nucleic acid molecules present in a complex biological sample, including a mixture of bacterial, viral, human and environmental DNA without DNA amplification or millions of cloning operations. Minimizing sample handling and low chemical consumption, and a fully integrated approach reduces sequencing costs per base by at least 1,000-fold, or more. The methods of the invention enable sequencing of the entire human genome on a single array within a day.

Random arrays of short DNA fragments are easily prepared, which are 100 times more dense than most standard DNA arrays currently in use. Probe hybridization to the array and advanced optics allow ultra-fast parallel data collection with a megapixel CCD camera. Each pixel in the array monitors the hybridization of a different DNA molecule, providing tens of millions of data points at a rate of 1-10 frames per second. Random arrays may contain more than 1 billion base pairs on a 3x3 mm surface, representing each DNA fragment in 10-100 pixel units. The inherent redundancy provided by the SBH method (where several independent overlapping probes read each base) helps to ensure the highest final sequence accuracy.

To achieve the full capacity of the ligation method of the present invention, i.e., up to 1000 bases per molecule read, multiple IPP reagents must be processed simultaneously. The ligation method of the present invention does not require covalent modification of each target molecule to be analyzed. Because the SBH probes are not covalently bound to the target, they can be easily removed or photobleached between cycles. In addition, the inclusion of a polymerase ensures that a single base can only be tested once in any given DNA molecule. The hybridization/ligation method of the present invention allows multiple probing with each given probe and multiple probing of each base by several overlapping probes, providing a 100-fold increase in the number of measurements per base. In addition, ligase allows the use of larger label structures (i.e., dendrimers with multiple fluorophores or quantum dots) than polymerase, which further improves detection accuracy.

The method of the invention allows the generation of universal profiling of long DNA molecules with a small incomplete set of commonly used probes. Each pixel can analyze a single molecule up to 10kb in length. The fragments are 10,000bp in length, and an array of 1 million fragments comprises one trillion (10)¹²) DNA bases, corresponding to 300 human genomes. The array was analyzed with a 10 megapixel CCD camera. The information profile was obtained within 10-100 minutes, depending on the level of multiple markers. Arrays that are 10-or 100-or 1000-fold smaller in size are very useful in characterization or sequencing or quantification applications.

In one embodiment, 10-10,000 fragments are used to represent one pathogen cell or virus in an array, and thus no DNA amplification is required. The single-molecule characterization method of the present application provides a comprehensive survey of each region of the pathogen genome, demonstrating a significant improvement over analyzing multiplex amplifications of thousands of DNA amplicons on a standard probe array. DNA amplification is a nonlinear process that is not reliable at the single molecule level. Instead of amplifying several segments per pathogen, the concentration of unwanted or contaminating DNA is reduced using a pathogen affinity column and the entire genome of the pathogen can be analyzed for collection. Single viral or bacterial cells can be collected from thousands of human cells, the collected cells being represented by fragments of 1 to 10kb on 10-1000 pixels, providing accurate identification and precise DNA classification.

In another embodiment, the method of the invention is used to detect and protect against biological warfare agents. rSBH identifies node markers, which allow immediate detection of bio-agents at the individual organism level before pathogenicity and symptoms occur. SBH provides a comprehensive analysis of any or all genes involved in pathogen attack patterns, virulence and antibiotic susceptibility in order to quickly understand the genes involved and how to circumvent any and all of these genes. rSBH allows analysis of complex biological samples containing a mixture of pathogen, host and environmental DNA. In addition, the method of the invention adopts a rapid and low-cost comprehensive detection method to monitor the environment and/or the staff, and can be made into products which are convenient to carry.

14. Reagent kit

The invention also provides IPP kits as products, which may be loaded into a cassette or a cassette with preloaded probes, the kit optionally comprising a ligation mixture comprising a buffer and an enzyme.

The invention also provides pathogen/gene-specific sample preparation kits, and protocols for analyzing pathogens, such as Bacillus anthracis (Bacillusanthracis) and Yersinia pestis (Yersinia pestis), from samples such as blood samples. The invention provides for pooling sample preparation DNA products into a substrate to form an rSBH array of the invention. A step-by-step method of generating a single target array per pixel, and optionally in situ amplification producing 10-1000 copies per pixel, is described. Random arrays of target DNA were generated and subjected to sequence analysis for rSBH. The modular approach to substrate preparation in the present invention allows for early forms of substrates with simple sample application sites, while the final developed substrate can have a "plug and play" array preparation module.

DNA samples meeting the minimum purity and gauge will be used as starting material for true sample integration with rSBH sample alignment techniques. Sample integration begins with enzymatic digestion (restriction enzyme or nuclease digestion) of the original sample product, generating fragments approximately 250bp in length that provide specific (or random) cohesive ends. This enzyme mixture represents one of the several components that may be provided in the product kit.

The arrangement of digestion involves the attachment of sticky ends to complement arranged on the surface. The array surface was modified from its original glass surface as follows: 1) forming an aminopropylsilane monolayer; 2) activation with symmetric diisothiocyanate; 3) the activated array surface was modified with a heterogeneous monolayer of probes with a new mixture of aminated oligonucleotides (including capture probes, primer probes and spacer probes).

All attachment probes share a conserved design (>90%), thus preventing the formation of homologous islands where the spacer and capture probes are isolated. The ratio of capture probe to all other probes produced an average density equal to 1 complementary ligation site (sample and capture probe)/square micron, each square micron being observed by a single pixel of the ultrasensitive CCD. Next, the digested DNA sample was added to the pre-formed array surface and ligated to the capture probes using T4 ligase to obtain a new rSBH reaction site consisting of one target per pixel. Excess sample is removed from the array surface and the dsDNA generates ssDNA by heating and additional washing. Here, a phosphorylation strategy was used in the capture probe design to ensure that only one strand was actually covalently attached to the rSBH array, and the other strand was washed away.

In the detection of suitable well-known techniques (Andreamis and Chrisey, Nucl. acids sRs.28: E5 (2000); Abath et al, Biotechnicques 33:1210 (2002); Adessi et al, Nucl. acids sRs.28: E87(2000), incorporated herein by reference in its entirety), local in situ amplification of the target is necessary to produce a satisfactory detection signal. Isothermal strand displacement techniques may be most suitable for local low copy number amplification. In order to separate the capture probes, a spacer probe and a primer probe need to be incorporated. These probes share some conserved sequences and structures, and each probe serves its name to describe. Thus, the capture probe captures the target DNA and the spacer probe helps to form a suitably spaced monolayer of probes, if desired, the primer probe being present for in situ amplification. All targets were gradually cleared of the same array primer sequence, simplifying the task. Once the sample is attached to the array, the free ends of the array DNA will be available for universal primer amplification. In situ amplification is performed on the molecules within the array using standard protocols and materials (i.e., primers, polymerase, buffers, NTPs, etc.). Although 10-1000 copies may be sufficient, only about 50 copies are required. Each target can be amplified with different efficiencies without affecting the sequence analysis.

In summary, sample integration and rSBH array formation requires DNA digestion of the product of the original sample preparation, separation, and pooling to the substrate to form the rSBH array. The present invention provides reagents and kits related to the digestion, isolation and ligation steps.

15. Sequence listing

The sequence listing lists the polynucleotide sequences described herein and was submitted on an optical disc containing the file label "CAL-2 CIPPPCT. txt" -8.00 KB (8.192 bytes) created by IBMPC for the Windows2000 operating system at 2004, 26:18 AM. The sequence listing entitled "CAL-1 CIPPTC. txt" is incorporated herein by reference in its entirety. The computer readable form ("CRF") and 3 copies ("copy 1", "copy 2" and "copy 3") of the sequence table "CAL-2 cippct.txt" are submitted here. The applicants of the present invention claim that the contents of CRF and copies 1,2 and 3 of the sequence listing filed in accordance with 37CRF § 1.821(c) and (e), respectively, are identical.

Examples

1. Sequencing a bacterial genome

The entire bacterial genome of a normally avirulent laboratory strain is sequenced. Coli strains which are well characterized and of known sequence were selected. The entire genome was sequenced in one day assay. This assay illustrates the overall operation of the diagnostic system, as well as determining important specifications associated with the design of system inputs and outputs and general requirements for the isolation and preparation of the original sample.

A single colony or a few microliters of liquid culture from a streaking plate provides sufficient material. Cells are lysed and DNA isolated using protocols well known in the art (see Sambrook et al, molecular cloning: A laboratory Manual, Cold spring harbor laboratory Press, New York, 1989, or Ausubel et al, New compiled molecular biology protocols, John Wiley & Sons, New York, 1989, both of which are incorporated herein by reference in their entirety). The yield is not critical and an important factor is the quality of the DNA. The sample specifications defined in this example apply to all other samples. Genomes with copy numbers of 10-100 were used for final analysis. Additional requirements of the assay are: 1) the DNA does not contain DNA treatment enzyme; 2) the sample does not contain impurity salt; 3) the entire genome is represented on average and constitutes the majority of total DNA; 4) the length of the DNA fragment is between 500 and 50,000 bases; and 5) the sample is provided as a sterile DNA solution, at a known concentration (e.g., 1.0. mu.g/ml, 1. mu.l is sufficient).

An input copy number of 10-100 ensures overlap of the entire genome and allows capture of target differences on the array. Enough overlapping fragments were obtained with 10-100 copies to ensure adequate success for base calling and high accuracy. The rSBH samples had a mass of about 1-10 picograms, most of which were used for sample identification and quantification. Samples for analysis were obtained by serial dilution of the identified products.

The DNA must be free of proteins, especially nucleases, proteases and other enzymes. Most of the proteins are removed and inactivated by phenol-based extraction, such as PCl (Sambrook et al, 1989, supra; Ausubel et al, 1989, supra). Hypotonic lysis or detergent-based lysis (with nuclease inhibitor cocktail, such as EDTA and EGTA), followed by PCl extraction is a fast and efficient method of sample digestion and one-step DNA isolation. Phase-locked extraction (available for 3 '5') simplifies this task, producing pure DNA. No DNA digestion is required at this point because the shear forces during lysis and extraction produce fragments of the desired length range. Phenol removal was accomplished by stringent purification of the DNA (i.e., subsequent chloroform extraction, ethanol precipitation, and size exclusion). The Ultraviolet (UV) spectral features left behind by phenol were used for purity detection and DNA quantification.

The DNA must be free of contaminating salts and organics, suspended in SBH compatible Tris buffer. This is done by size exclusion chromatography or microdialysis.

The original DNA samples ranged from 500bp to 50,000 bp. Fragments below 500bp are difficult to renature in isolation and purification and also affect the array process. Fragments greater than 50,000bp are difficult to dissolve and can aggregate irreversibly.

The sample is provided in a sterile solution of 1 microgram/ml for at least 1 microliter. The total required amount of original DNA is only-1 ng to 1 pg, which is less than 1% of the amount to be sequenced.

For final sample preparation, digestion of the DNA yields fragments of the expected average length of about 250bp with sticky ends that can be used to align molecules on the surface of the combinatorial array. These molecules are spaced such that one molecule appears per square micron, which is observed by a single pixel of the CCD camera and represents one virtual reaction well within an array of millions of wells. This requires the elimination of the self-assembled monolayer (SAM) effect. An enzyme-driven procedure is used that links the sample to specific sites spaced within a single layer of a combinatorial array that is chemically attached to the surface of the detection substrate. The capture array is driven by SAM chemistry, but small changes in the terminal complementary overhangs should not produce islands of similar sequence. Thus, a substrate is prepared with a capture array and the sample is attached to the substrate surface by enzymatic ligation of appropriate overhangs.

In addition, each target needs to be amplified in situ, resulting in "amplicons". Amplification is accomplished with a universal primer adaptor that is ligated to the target sequence through the end that is not attached in the initial capture ligation. A new strand is synthesized with DNA polymerase and NTPs, replacing the original complement, and a replacement strand with complementary elements is provided on the capture array, and thus captured and ligated in accordance therewith. It is expected that 10 copies will be generated by linear amplification. In addition, an exponential amplification strategy can be used to generate 100-.

Array samples, single molecules or local amplicons were subjected to rSBH cycle sequencing with dedicated probes and integrated microfluidics. Bioinformatics is fully integrated for data collection, storage, analysis and sequence alignment. Results are reported for genomic sequences of candidate organisms using base calls and statistical analysis of accuracy.

2. Preparation of samples from cell cultures or blood samples of Bacillus anthracis and Yersinia pestis

2.1 Whole genome analysis

Isolation of a particular pathogen from a raw sample requires isolation or enrichment of cells from the raw sample, followed by lysis to produce a particular genome. Standard biochemical and cell biology laboratory techniques such as fractional centrifugation, filtration, culture or affinity chromatography are used to isolate cells and then extract the genome. Generally, most pathogens are at least two orders of magnitude smaller than human cells, and several orders of magnitude larger than most biomolecular structures, thus reasonably allowing easy isolation by traditional physical techniques. Laminar flow separation is preferably performed with commercially available antibodies or other affinity tools already used for certain targets, such as viral coat proteins, and risks are minimized. When the organisms are enriched, the organisms are lysed and the DNA isolated using standard protocols.

In addition, genomic amplification can be accomplished with specific primers (for heterologous primary samples) or universal primer sets (for isolated cell types) containing reversible affinity tags. The sample is lysed, if necessary, and isolation of the original DNA is performed. The primers and amplification mixture are added simultaneously to the original sample and the products are isolated by reversible labeling and affinity capture.

2.2 genome footprint analysis

The method includes amplifying a specific set of footprint genes specific to the organism of interest. By detecting multiple gene regions simultaneously, different strains of the same pathogen can be distinguished, or a large number of different pathogens can be screened. The following documents (Radneg et al, App. env. micro.67:3759-Assays useful for detecting various bio-threatening pathogens are described in appl.eikv.micro. press (2003), incorporated herein by reference in its entirety). DNA regions specific to the pathogen of interest are identified, and these regions are not present in the close relative of the pathogen. Primers were then designed to examine the amplification of DNA products in environmental samples. Bacillus anthracis and Yersinia pestis were used as model organisms. A defined amount of pathogen cells was mixed with human blood to determine the sensitivity of the detection. Patients with early symptoms contain any of these pathogens in their blood at concentrations>10⁴Cells/ml blood. The aim is to detect the pathogen before it reaches the symptomatic phase. The inspection has a 10¹To 10⁵Genomic DNA was extracted using the QiaAmp tissue kit 250(Qiagen, Valencia, CA) or the NucleoSpin Multi-8 blood kit (Macherey-Nagel, D ü ren, Germany.) pathogen concentrations were determined by plating metaphase cells and using a cytometer with 10 microliters of diluted cells added to 190 microliters of human blood to approximate pre-symptom concentrations.

3. Assays for preparing 100 diagnostic targets from biohazard-free samples of Bacillus anthracis and Yersinia pestis

The targets are selected to identify potential antibiotic resistance regions, mutations in toxic genes, and vector sequences that can be used as a reference in genetic engineering. Such targets, particularly virulence and antibiotic resistance genes, are generally not unique to a particular pathogen, but provide additional qualitative information. The target DNA will be amplified with 50 primer pairs to probe for the unique and qualitative regions of interest in each pathogen. The product was pooled into one sample for SBH analysis. Multiple primer pairs may be used to simplify amplification of the target sequence.

The primers used have a cleavable tag for separating the amplicons from the original composite DNA mixture. Preferably, the tag is biotin/streptavidin-based, with a DTT cleavable disulfide bond, or a specifically genetically engineered restriction site within the primer. Amplicons were isolated by affinity labeling and released as purified DNA samples. The product was further purified by size exclusion to remove any unwanted salts and organics, and then quantified for downstream integration.

4. Sequencing samples from microbial biofilms

The biofilm colony genomes were examined with rSBH combined with field studies and FISH. With rSBH, biofilm populations were sequenced at more than one time point and from different habitats to determine genetic species. DNA normalization between samples simplified the analysis to highlight differences in the genotypic level of colony structure and provided significant coverage of low abundance genospecies genomes. A library of 16SrDNA clones was constructed for each sample according to a well-established protocol. Phylogenetically discriminating FISH probes and targeted SNPs to discriminate subtle variants within a phylogenetically were used to map distribution patterns and provide correlations between SBH-determined genetic species and 16SrDNA phylogenetic distributions. Samples were collected from habitats of different physical and chemical properties, and the major environmental parameters including pH, temperature, ionic strength, redox state (i.e., Fe) were measured at the time of sample collection²⁺/Fe³⁺Ratio) and the concentration of dissolved organic carbon, copper, zinc, cadmium, arsenic and other ions.

5. Base call simulation test

The mock data were generated using E.coli with a 250bp (average length) fragment that overlapped by 90%. The first 10,000 fragments were analyzed using a standard single base change call. This amount exceeds an amount sufficient to check accuracy and timing. Reference sequence search successfully found 10,000 fragment positions in the entire 4Mbp reference genome. In addition, the base calls made on each fragment are correct. The fragments were stored as a full 4Mbp reference sequence, confirming that the timing and accuracy of the search was independent of the number of fragments tested. The time required for reference sequence search and base calling was 0.8 seconds per fragment. Base calling includes testing for single base changes and normalization to optimize accuracy. Edges are allowed on both sides of the fragment in the reference sequence search, which increases the resolution time.

6. Single Q-point array and imaging

2 microliters of 0, 8, 160, and 400 picomoles streptavidin-binding Q-spots (Qdot, Hayward, Calif.) were deposited on the surface (in the center of the coverslip) of the biotin-modified coverslip (Xenopore, Hawthorne, NJ) for 2 minutes. The droplets were removed by vacuum. 10 microliters of deionized water was applied and removed in the same manner. This washing was repeated 4 times. The coverslip was processed upside down and placed on a clean slide. The slide was adhered to the surface with 1 microliter of water. A small amount of objective immersion oil was placed on the coverslip edge to prevent evaporation by creating a seal around the coverslip.

Imaging was performed with a zeissaxioviert 200 microscope with surface illumination by a PlanFluar100x immersion objective (1.45 na). Emission from point Q was 655nm imaged with a standard chromaticity Cy3 filter set. The transmission spectrum of the chromaticity Cy3 emission filter overlaps with the emission filter for point 655 nmQ. Using RoperscientificcCoolSNAP_HQ ^TMThe camera (Roperscientific, Tuscon, AZ) recorded the image with an exposure time of 50 milliseconds. It is evident from these images that higher concentrations of Q-dots produce more visible dots. Due to various contaminations, the control coverslip spotted with water contained only a few visible spots. In addition to the stable fluorescence of the Q-spot group with the expected color, a blinking spot was also seen that differed in both individual brightness and color. These features illustrate that these dots are single Q dots. The difference in brightness can be explained by far away wavelengths, out of the focal plane, or by activity changes between individual particles. The significance of these results is that individual molecules, if labeled with the Q-point, can be detected using advanced microscopy techniques. With TIRF systems to further reduce background, more efficient excitation by laser light is desirable for routine accurate detection of single fluorescent molecules.

7. Ligation signal and spotting target and oligonucleotide

These experiments were designed to demonstrate that: 1) the sample application target can be used as a template for connecting two probes, and has good full-matching specificity; 2) the deposited oligonucleotides can be used as primers (or capture probes) to attach the target DNA to the surface.

A. Assembling glass slides

4-5' -NH that can be used as target or primer or capture probe₂Modified oligonucleotides (sequences see table 1) spotted at 7 different concentrations (1, 5, 10, 25, 50, 75, 90 pmol/μ l) on 1, 4-phenylene isothiocyanate derived slides, each concentration being repeated 6 times. The long Tgt2-Tgtl-rc oligonucleotide contains the entire Tgt2 sequence and a portion of the Tgtl complement sequence (the underlined portions are complementary in an antiparallel orientation). Using Tgt2-Tgtl-rc as a test target that can be captured by Tgtl, capture efficiency can be determined by comparing 2-probe ligation to Tgt2 sequences that are directly spotted and captured by Tgtl.

TABLE 1 oligonucleotides used as target or primer or capture probes

B. Experiment 1

In a closed chamber, hybridization/ligation was performed at room temperature for 1 hour. The reaction solution contained 50 mmol Tris, 0.025 units/microliter T4 ligase (Epicentre, Madison, Wis.) and 0.1 mg/ml BSA, 10 mmol MgCl₂1 mM ATP, pH7.8 and varying amounts of a pool of ligation probes (see Table 2) from 0.005 to 0.5 picomoles/microliter. After the reaction, the slide glass was washed with 3XSSPE at 45 ℃ for 30 minutes, then washed 3 times with double distilled water, and dried by centrifugation. These slides were then scanned on an AxonGenePix4000A with PMT set at 600 mv.

TABLE 2

Note: indicates labeled Tamra, underlined bases indicate the position of single base mismatches.

C. Experiment 2

With 4 NH groups₂Modified 26-32mers spotting slides with 1 picomolar long target Tgt2-Tgtl-rc (Table 1) dissolved in 20. mu.l 50 mM Tris and 0.1 mg/ml BSA, 10 mM MgCl₂The slides were hybridized at pH7.8 for 2 hours at room temperature. Slides were washed with 6XSSPE for 30 minutes at 45 ℃. The ligation probes (Tgt2-5 'probe and Tgt2-3' probe, Table 2) were then incubated for 1 hour at room temperature in the presence of 0.5 units/20. mu.l of T4 ligase. After reaction, the slides were washed and scanned as described above.

D. Results

1. The ligation signal depends on the concentration of the spotted target and the concentrations of the 5 'probe and 3' probe in the reaction solution

FIG. 12 shows the dependence of ligation signal on spotted target and ligated probes in solution. The highest signal was obtained when the spotted target concentration was about 75 pmol/microliter and the ligated probes (probe-5 'and probe-3') were about 1 pmol in 20. mu.l of the reaction solution. These dependencies indicate that the observed signal is actually a ligation dependent signal and that the spotted target can be used as a ligation template. The difference between the perfect match ligation probe and the single base mismatch probe was about 4-20 fold (Table 3).

Table 3: full match and single mismatch discrimination of ligation signals

Target	FM/SMM of 5' -probes	FM/SMM of 3' -probes
			Tgt1	14	20
Tgt2	7	12
			Tgt3	9	16
Tgt4	4	4

2. The spotted oligonucleotides can be used as primers (or capture probes) to efficiently ligate target DNA

Oligonucleotide 1 spotted on the slide (Tgt1) served as a capture probe for target Tgt2-Tgt1-rc, and on its 3 '-side, Tgt2-Tgt1-rc contained a reverse complement of Tgt1 and on its 5' -side, a reverse complement of Tgt 2. After hybridization/capture of Tgt2-Tgt1-rc, the ligation probes (Tgt2-5 'probe and Tgt2-3' probe) hybridized to the point of Tgt2 target, as well as to the point of Tgt1 target. Fig. 13 shows the observed connection signal. Clearly, under this condition, the spotted target can be used as a primer (or capture probe) to ligate the target DNA in a form that can be used to hybridize/ligate short probes for sequencing.

Claims

1. A system for analyzing a target nucleic acid, the system comprising:

(a) a reaction platform;

(b) an array on the surface of the platform, wherein the array comprises a solid substrate comprising a plurality of regions, each region configured to immobilize a polynucleotide, the polynucleotides being single molecules or amplicons, wherein each single molecule or amplicon comprises a fragment of the target nucleic acid;

(c) a light source arranged for exciting fluorescent molecules on or near the surface;

(d) a megapixel camera disposed above the reaction platform;

(e) a lens arranged to focus the region of the platform such that single molecules or amplicons of the array on the platform are focused on a single pixel of the camera.

2. The system of claim 1, wherein each region is1 micron square.

3. The system of claim 1, wherein said array contains more than one million of said regions.

4. The system of claim 1, wherein the light source is a laser, the system further comprising a current meter for controlling light from the laser.

5. The system of any one of claims 1-4, wherein the system comprises fragments of the target nucleic acid immobilized on the surface at an average density of one polynucleotide per pixel.

6. The system of claim 5, wherein the system comprises a fluorescently labeled probe that hybridizes to a fragment of the target nucleic acid.

7. The system of any one of claims 1-4, wherein the camera is a CCD camera.

8. The system of any one of claims 1-4, wherein the polynucleotide comprises a fragment of the target nucleic acid and linker sequences at each end of the fragment.

9. The system of claim 8, wherein each region comprises a linked oligonucleotide, wherein the oligonucleotide is complementary to the linker sequence.

10. A method of analyzing a target nucleic acid, the method comprising:

(a) arranging polynucleotides containing fragments of the target nucleic acid onto the reaction platform of the system of any one of claims 1-4 to form an array of average density of one polynucleotide per pixel;

(b) performing a sequencing reaction on the array;

(c) recording signals from each pixel; and

(d) repeating steps (b) through (c) to generate a sequence of the target nucleic acid.