WO2008140494A9

WO2008140494A9 - High throughput screening using microarrays

Info

Publication number: WO2008140494A9
Application number: PCT/US2007/024291
Authority: WO
Inventors: Syed Anwar Hashsham; James M Teidje; Erdogan Gulari; Dieter Tourlousse; Robert Stedtfeld; Farhan Ahmad
Original assignee: Michigan State University MSU
Current assignee: Michigan State University MSU
Priority date: 2006-11-22
Filing date: 2007-11-21
Publication date: 2009-04-30
Anticipated expiration: 2009-05-22
Also published as: WO2008140494A2; WO2008140494A3

Abstract

The present invention relates to the design and manufacture of microarrays for use in detecting multiple organisms in a sample in a high throughput manner. In particular, the present invention relates to the detection of waterborne pathogens in a sample of water.

Description

HIGH THROUGHPUT SCREENING USING MICRO ARRAYS

This invention was made with^' government support from the National Institutes of Health; grant numbers IRO IRRO 18625-01 , 5RO IRRO 18625-02, 5R0 IRRO 18625-03, and 5R0 IRRO 18625-03. The United States Government has certain rights in the invention.

FIELD OF THE INVENTION

BACKGROUND OF THE INVENTION

Waterborne pathogens are estimated to cause 2.2 and 5 million cases deaths per year. Waterborne illnesses continue to be a problem even in developed nations where outbreaks of unknown etiology are routine and chlorine resistant pathogens are emerging as a threat to public health. Mackenzie, W. R., et al., A Massive Outbreak in Milwaukee of Cryptosporidium Infection Transmitted through the Public Water-Supply. New England Journal of Medicine, 1994. 331(3): p. 161-167, herein incorporated by reference. Historically, people have been exposed to waterborne pathogens through accidental or inadvertent contamination of their water supply. Additionally, measures are needed for monitoring and guarding against deliberate contamination of the water supply. With this emergence of a double threat the need for continuous monitoring of the water supply has become a critical issue for public health and national security. Studies on the Affymetrix platform have shown the feasibility of pathogen detection using high density microarrays. Wilson et al. (High-density microarray of small-subunit ribosomal DNA probes. Applied and Environmental Microbiology, 2002. 68(5): p. 2535-2541, herein incorporated by reference) describe development of a microarray targeting the small-subunit ribosomal DNA of 18 pathogenic bacteria (not all are human pathogens). In another paper, Wilson et al. (Sequence-specific identification of 18 pathogenic microorganisms using microarray technology. Molecular and Cellular Probes, 2002. 16(2): p. 1 19-127, herein incorporated by reference) developed a multi- pathogen identification microarray to detect 18 organisms targeting various genes. Both microarrays were validated with DNA obtained from air filtrate.

A common problem with such assays is the development of multiplexed assays for multiple microorganisms. Current strategies for probe design for multiplex detection of organisms often limit the number of available probes and, in turn, the robustness and accuracy of the assays. What is needed in the art are better methods of identifying probes for use in multiplex assays so that more robust and accurate and assays can be designed.

SUMMARY OF THE INVENTION

The present invention relates to the design and manufacture of microarrays for use in detecting multiple organisms in a sample in a high throughput manner. In particular, the present invention relates to the detection of waterborne pathogens in a sample of water. The present invention provides methods for identifying regions of DNA that will specifically identify groups of pathogenic organisms. In some embodiments, the present invention provides methods for identifying regions of DNA that will specifically identify a single pathogenic organism. In some embodiments, the present invention provides methods for providing DNA probes for identifying regions of pathogen DNA. In further embodiments, the present invention provides methods for providing DNA probes for identifying subregions of pathogen DNA. In some embodiments, the present invention provides methods for providing nonoverlapping DNA probes.

For the purposes of the present inventions, the "maximum number of nonoverlapping probes" depends on the size of the gene sequence and the predetermined length of the probes. For example, where the gene sequence is 1000 bases and the predetermined length of the probes is 100 bases, the maximum number of nonoverlapping probes is 10. Where the gene sequence is 1000 bases and the predetermined length of the probes is 50 bases, the maximum number of nonoverlapping probes is 20 (and so on). The probes can be "generated" electronically or chemically (synthetically) or even enzymatically. A "subregion" is smaller in size than the complete gene sequence. Typical subregions are in the size range of 50-500 nucleotides. The predetermined number of data base microorganisms depends on the degree of specificity desired. In one embodiment, the total database comprises 100 organisms and the predetermined number is 5. In one embodiment, the total database comprises 100 organisms and the predetermined number is 4. In one embodiment, the total database comprises 100 organisms and the predetermined number is 3. In one embodiment, the total database comprises 100 organisms and the predetermined number is 2. In one embodiment, the total database comprises 100 organisms and the predetermined number is l .

In some embodiments, the present invention provides methods of providing target probes for detecting a target pathogen comprising, a) generating a maximum number of nonoverlapping probes of a predetermined length from a gene sequence derived from a target microorganism, b) comparing the sequence of each nonoverlapping probe to sequences from a plurality of database microorganisms, c) identifying at least one subregion in the gene sequence derived from a target microorganism, wherein the at least one subregion contains exact matches to less than a predetermined number of the database microorganisms; and d) generating a set of overlapping probes of a predetermined length for the at least one subregion. The present invention is not limited to any particular length of nonoverlapping probes. In some embodiments, the predetermined length of the nonoverlapping probes is from 10 to 30 bases. In some embodiments, the predetermined length of the nonoverlapping probes is from 30 to 60 bases. The present invention is not limited to subregions containing exact matches to any particular number of database microorganisms. In some embodiments, the subregion contains exact matches to less than from 5 up to 100 database microorganisms. The present invention is not limited to subregions of any particular length. In some embodiments, the subregion is from about 50 to about 500 bases in length. In some embodiments, multiple subregions containing exact matches to less than a predetermined number of database sequences are identified. In some preferred embodiments, overlapping probes of a predetermined length are generated for the multiple subregions. In further preferred embodiments, the set of overlapping probes of a predetermined length are from 15 to 30 bases in length. In other preferred embodiments, the set of overlapping probes of a predetermined length are from 40 to 60 bases in length. In some embodiments, the methods of the present invention further comprise the step of selecting at least one probe for the at least one subregion. In some embodiments, the at least one probe for the at least one subregion has at least one base pair difference as compared to the sequences of the database microorganisms. In some embodiments, the methods of the present invention further comprise the step of arraying the at least one probe on a solid surface.

In some embodiments, the present invention provides a set of nucleic acid probes generated by the foregoing methods. In other embodiments, the present invention provides microarrays comprising the foregoing set of nucleic acid probes. In further embodiments, the present invention provides a microarray comprising nucleic acids probes for a target organism, wherein the probes are selected by generating a maximum number of nonoverlapping probes of a predetermined length from a gene sequence derived from the target organism, comparing the sequence of each nonoverlapping probe to sequences from a plurality of database microorganisms, identifying at least one subregion in the gene sequence derived from a target microorganism, wherein the at least one subregion contains exact matches to less than a predetermined number of the database microorganisms; and generating a set of overlapping probes of a predetermined length for the at least one subregion. The present invention is not limited to microarrays formed on any particular substrate. In some embodiments, the probes are arrayed on a glass slide.

In some embodiments, the present invention provides methods for high throughput detection of multiple target organisms comprising a) providing: i) a microarray comprising sets of multiple discreet nucleic acid probes for at least one target organism, wherein the nucleic acid probes are complemetary to multiple amplicons from multiple target genes of the at least one target organism; ii) primers complementary to multiple target genes of the multiple target organisms; b) using the primers to amplify the multiple target genes from the multiple target organisms to produce amplicons; c) contacting the microarray with the amplicons under conditions such that the amplicons hybridize to the discreet nucleic acid probes; d) detecting the presence of amplicon binding to the discreet nucleic acid probes. In some embodiments, the methods provide a set of nucleic acid primers. In some embodiments, the methods provide a set of nucleic acid probes. In some embodiments, the methods provide sequences selected from the group consisting of SEQ ID Nos. 1-19206. In some embodiments, the methods provide sequences selected from the group consisting of SEQ ID Nos. 1-791. In some embodiments, the methods provide sequences selected from the group consisting of SEQ ID Nos. 792-1 1533. . In some embodiments, the methods provide sequences selected from the group consisting of SEQ ID Nos. 1 1534-13225. In some embodiments, the methods provide sequences selected from the group consisting of SEQ ID Nos. 13226- 19206

In some embodiments, the present invention provides a microarray comprising the set of nucleic acid probes generated by methods of the present inventions. In some embodiments, the microarray comprises at least one sequence selected from the group consisting of SEQ ID Nos. 1-19206

In some embodiments, the present invention provides a microarray comprising nucleic acids probes for a target organism, wherein said probes are selected by generating a maximum number of nonoverlapping oligonucleotide probes of a predetermined length from a gene sequence derived from said target organism, comparing the sequence of each nonoverlapping oligonucleotide probes to sequences from a plurality of database microorganisms, identifying at least one subregion in said gene sequence derived from a target microorganism, wherein said at least one subregion contains exact matches to less than a predetermined number of said database microorganisms; and generating a set of overlapping probes of a predetermined length for said at least one subregion. In some embodiments, the oligonucleotide probes are arrayed on a substrate selected from the group consisting of a glass slide and a serpentine chip. In some embodiments, the target organisms are selected from the group consisting of Aeromonas hydrophila, Campylobacter jejuni, Clostridium perfringens, Helicobacter pylori, Legionella pneumophila, Listeria monocytogenes Staphylococcus aureus, Salmonella, Pseudomonas aeruginosa, Vibrio cholerae, Vibrio parahaemolyticus, Yersinia enter ocolitica, E. faecalis, S. enter ica,, C. parvum, G. intestinalis, K. pneumoniae, E. coli. and combinations thereof. In some embodiments, the oligonucleotide probes are complementary to virulence marker genes from said target organisms. In some embodiments, the present invention provides a microarray for high throughput detection of waterborne pathogens comprising sets of multiple discreet nucleic acid probes for at least ten target organisms, wherein said nucleic acid probes are complemetary to multiple amplicons from multiple virulence marker genes from each target organism and multiple discreet nucleic acid probes are provided for each amplicon, and wherein said at least 10 target microorganism are selected from the group consisting of Aeromonas hydrophila, Campylobacter jejuni, Clostridium perfringens, Helicobacter pylori, Legionella pneumophila, Listeria monocytogenes Staphylococcus aureus, Salmonella, Pseudomonas aeruginosa, Vibrio cholerae, Vibrio parahaemolyticus, Yersinia enter vcolitica, E. faecalis, S. enteήca,, C. parvum, G. intestinalis, K. pneumoniae, and E. coli. In some embodiments, the nucleic acid probes are selected from the group consisting of oligonucleotide probes encoded by SEQ ID Nos. 1-19206 and combinations thereof.

In some embodiments, the present invention provides a method for high throughput detection of multiple target organisms a) providing: i) a microarray comprising sets of multiple discreet nucleic acid probes for multiple target organisms, wherein said nucleic acid probes are complemetary to multiple amplicons from multiple target genes of said at least one target organism, and wherein multiple discreet nucleic acid probes are provided for each amplicon; ii) primers complementary to multiple target genes of said multiple target organisms; and b) using said primers to amplify said multiple target genes from said multiple target organisms to produce amplicons; c) contacting said microarray with said amplicons under conditions such that said amplicons hybridize to said discreet nucleic acid probes; and d) detecting the presence of amplicon binding to said discreet nucleic acid probes. In some embodiments, the target organisms are selected from the group consisting of Aeromonas hydrophila, Campylobacter jejuni, Clostridium perfringens, Helicobacter pylori, Legionella pneumophila, Listeria monocytogenes Staphylococcus aureus, Salmonella, Pseudomonas aeruginosa, Vibrio cholerae, Vibrio parahaemolyticus, Yersinia enter ocolitica, E. faecalis, S. ent erica,, C. parvum, G. intestinalis, K, pneumoniae, E. coli. and combinations thereof. In some embodiments, the probes are complementary to virulence marker genes from said target organisms. In some embodiments, the nucleic acid probes are selected from the group consisting of probes encoded by SEQ ID Nos. 1-19206 and combinations thereof. The present invention provides a set of oligonucleotide probes consisting of at least 2 sequences selected from the group consisting of SEQ ID Nos. 1-19206.

The present invention provides a microarry consisting of at least 2 sequences selected from the group consisting of SEQ ID Nos. 1-19206. In some embodiments, the microarray is a serpentine chip.

DESCRIPTION OF THE FIGURES

FIG. 1 shows an exemplary result of testing microbial sequences obtained by compositions and methods of the present inventions for a 16s rDNA sequence. Panel A: Graph of 73 non-overlapping 20-mers. Each 20-mer sequence was then used to obtain the number of matches (exact) in The Ribosomal Database Project II (RDP-II) database and plotted on the y-axis. Panel B: For the region from 135 to 220 yielding the first and second regions with 10 matches or less (in the left panel), the 20-mer probes were tested for matches in RDP-II by walking one base at a time.

FIG. 2 shows an exemplary evaluation of 791 targeted and 2,034 not-targeted probes with a composite target sample containing 47 VMG amplicons (24 labeled with Cy3 and 23 labeled with Cy5). (A) Part of the Xeotron chip after hybridization and washing up to 30 ⁰C, spot features are 50 μm in diameter. (B) Distribution and boxplot of SNRs for 791 targeted (black bars and lower boxplot) and 2034 not-targeted probes (grey bars and upper boxplot). (C) Distribution of the positive fraction for 47 targeted (black bars) and 67 not-targeted probe sets (grey bars). The dashed line shows the cut-off (0.5) for presence/absence calls. (D) Success rate of probe design. The number of probes showing positive signal (y-axis) vs. the number of designed probes (x-axis) for each VMG (black symbols: targeted VMGs, and grey symbols: not-targeted VMGs). The dashed lines represent varying levels of success. The slope of these lines noted is equal to the positive fraction.

FIG. 3 shows an exemplary targeted probe response and characterization. VMGs are sorted from bottom to top according to increasing median SNRs. (a) positive fraction, Boxplots represent the variability in SNRs obtained for each VMG, (b) Positive fractions, and (c) G+C content of the amplified region with each symbol representing a VMG for each of 47 targeted VMG amplicons. Probe sets are sorted from bottom to top according to increasing median signal-to-noise ratio.

FIG. 4 shows an exemplary success rate of probe design as function of free energy of probe-target duplex formation (ΔG^od_upiex)- Each symbol represents 40 probes with error bars reflecting the standard deviation among three replicates. The percent positive probes is plotted at the median ΔG°_dupi_ex of each bin. The G+C content on the upper X-axis was derived from a linear relationship between a ΔG°d_upiex and G+C content.

FIG. 5 shows an exemplary performance of a probe set for a VMG biochip with DNA extracted from several types of water samples. Genomic DNA (10 pg) of each pathogen was spiked into 10 ng of background DNA, yielding a relative abundance for each pathogen of 0.1%. Spiked samples and unspiked background samples were labeled with Cy3 and Cy5, respectively, mixed and hybridized in duplicate. For spiked samples, 673 probes (46 VMG amplicons, marker gene targets chosen from validated probes as described herein) were analyzed. For unspiked samples, 2,034 probes (67 VMG amplicons) were analyzed. Boxplots represent the distribution of the positive fraction among the VMG amplicons. The dashed line represents the selected threshold for presence/absence (positve/negative) calls.

FIG. 6 shows an exemplary derivation of optimal probe design criteria demonstrating a dependence of positive fraction on Gibb's free energy of duplex formation, (a) Selectivity, calculated as the difference in the percent positive probes for targeted and not-targeted probes, as function of ΔG°d_upiex- Maximum selectivity (-80%) was obtained for probes with a ΔG°dupieχ of -19.3 kcal/mole and a corresponding G+C content of 47.2%. The shaded area represents the ΔG°d_upι_eχ and G+C content range yielding a selectivity higher than 70%. (b) Percent positive probes as function of free energy of probe-target duplex formation (ΔG°_dupi_ex) for targeted and not-targeted probes (black circles and diamonds, respectively). Each symbol represents 40 targeted and 80 not- targeted probes with error bars reflecting the standard deviation among samples and replications (n = 6 for targeted probes, and n = 4 for not-targeted probes). The G+C content on the upper X-axis was derived from a linear relationship between a ΔG°d_Upie_x and G+C content. For spiked samples (black circles), 673 probes were analyzed. For unspiked samples (black diamonds), 2034 probes were analyzed. Probes were sorted according to ΔGo, and binned (40 probes for targeted probes and 80 probes for not- targeted probes). The y-axis refers to the fraction of probes within each bin resulting in a positive signal.

FIG. 7A-0 shows an exemplary list of 20 mer probes for waterborne pathogens.

FIG. 8 shows an exemplary list of microorganisms and virulence marker genes for use in compositions and methods of the present inventions.

FIG. 9 shows an exemplary description of multiplex PCR reactions and the specificity and sensitivity achievable with the probe design of the present invention.

FIG. 10 shows an exemplary Serpentine Chip, developed for use in compositions and methods of the present inventions, please note the solid (nontransparent) base.

FIG. 1 1 shows an exemplary microfluidic DNA biochip with recirculation capabilities for detecting and quantifying infectious agents by hybridizing PCR amplified products to oligonucleotide probes as provided herein (see, Figures 12 and 13): (a) the exemplary chip was approximately 1 cm2, (b) a close-up view of microfluidic channels and a portion of the approximately 8,000 reactors on the chip, (c) a close-up view of 6 reactors, each with a 50 micron diameter, (d) a signal to noise ratio for 5 genes belonging to one of the 20 organisms that were tested on the chip, and (e) a laser scanned signal intensities for part of the chip experimental results (a green signal represents a positive hybridization result). FIG. 12 shows an exemplary list of 50 mer oligonucleotide probes for identifying waterborne pathogens developed and used as described herein.

FIG. 13 shows an exemplary list of validated 50 mer oligonucleotide probes for identifying waterborne pathogens developed and used as described herein.

FIG. 14 shows exemplary results from spot testing 10,000 50 mer probes (see, Figure 12) attached to a Serpentine Chip, oligonucleotide probes were synthesized as described herein, hybridzation products were labeled with a SY3 dye, (A) results were visualized by a laser scanner, green areas were positive, dark spots were negative, (B) demonstrates exemplary low signal to noise ratios.

FIG. 15 shows an exemplary confirmation of amplification in a serpentine PCR chip demonstrating reaction products obtained from a nonleaking chip (a) microfulidic channel, (b) PCR product detectable after the 15^th cycle, and (c) demonstration of success obtaining the expected size PCR product by routine gel electrophoresis.

FIG. 16 shows the stability of exemplary freeze-dried PCR reagents (A) Optimization of trehalose concentration for freeze-dried Taq Polymerase and (B) Stability of freeze-dried PCR reagents with 15% Trehalose.

FIG. 17 shows exemplary results from a helicase-dependent isothermal amplification.

DEFINITIONS

To facilitate an understanding of the present invention, a number of terms and phrases are defined below:

The use of the article "a" or "an" is intended to include one or more. As used in this application, the singular form "a," "an," and "the" include plural references unless the context clearly dictates otherwise. For example, the term "an agent" includes a plurality of agents, including mixtures thereof. As used herein, the term "serpentine chip" refers to a chip with serpentine shaped microchannels.

As used herein, the term "arraying" in reference to a primer or probe refers to the use of the primer or probe in a microarray, for example, by attaching the probe or primer to the microarray surface, adding the probe or primer to the microarry, and the like.

As used herein, the term "transducer device" refers to a device that is capable of converting a non-electrical phenomenon into electrical information, and transmitting the information to a device that interprets the electrical signal. Such devices can include, but are not limited to, devices that use photometry, fluorometry, and chemiluminescence; fiber optics and direct optical sensing (e.g., grating coupler); surface plasmon resonance; potentiometric and amperometric electrodes; field effect transistors; piezoelectric sensing; and surface acoustic wave.

As used herein, the term "optical transparency" refers to the property of matter whereby the matter is capable of transmitting light such that the light can be observed by visual light detectors (e.g., eyes and detection equipment).

As used herein, "electroluminescence" or "EL" refer to a direct conversion of electrical energy into light.

As used herein, "electroluminescent sheet" and "electroluminescent film" and "ELM" and "electroluminescent panel" and "electroluminescent wire" and "EL lamp" refer to a type of capacitor comprising a thin layer of light emitting phosphor located between two electrodes, wherein the first electrode is opaque and the second electrode is translucent to allow light to escape. For one example, AC current (400 - 1600 Hz) is applied to a phosphor resulting in the emission of light wherein phosphor chemical composition and dye pigments determine the brightness and color of the emitted light. Examples of electroluminescent compositions include ruthenium, ruthenium polypyridyl complex, tris-bippyridine, tris(bipyridine) Ru(II) complexes, tripropylamine, Ruthenium Tris(2,2')bipyridyl/Tripropylamine, and the like.

As used herein, "phosphor" refers to a substance that exhibits the phenomenon of phosphorescence, such as a natural substance, for example a transition metal compound or rare earth compounds of various types or synthetic, for example, a suitable host material, to which an activator is added such as a copper-activated zinc sulfide and the silver-activated zinc sulfide (zinc sulfide silver).

As used herein, "ACTFEL" and "alternating current thin film electroluminescence" refer to emitted light following exposure to an electrical current. As used herein, the term "film" refers to any substance capable of coating at least a portion of a substrate surface and immobilizing capture particles. Examples of materials used to make such films include, but are not limited to, agarose, acrylamide, SEPHADEX, proteins (e.g. bovine serum albumin (BSA), polylysine, collagen, etc.), hydrogels (e.g. polyethylene oxide, polyvinyl alcohol, polyhydroxyl butylate, etc.), film forming latexes (e.g. methyl and ethyl aerylates, vinylidine chloride, and copolymers thereof), or mixtures thereof. In certain embodiments, films include additional material such as plasticisers (e.g. polyethylene glycol [PEG], detergents, etc.) to improve stability and/or performance of the film. In preferred embodiments, a film is a material that will react with the capture particles and present them in the same focal plane. In other preferred embodiments, a film is pre-activated with cross-linking groups such as aldehydes, or groups added after the film has been formed.

As used herein, "optical signal" refers to any energy (e.g. photodetectable energy) emitted from a sample (e.g., produced from a microarray that has one or more optically excited [i.e., by electromagnetic radiation] molecules bound to its surface). As used herein, "filter" refers to a device or coating that preferentially allows light of a characteristic spectra to pass through it (e.g., the selective transmission of light beams). "Polychromatic" or "broadband" as used herein, refers to a plurality of electromagnetic wavelengths emitted from a light source.

As used herein, "microarray" refers to a substrate with a plurality of molecules (e.g., nucleotides) bound to its surface. Microarrays, for example, are described generally in Schena, "Microarray Biochip Technology," Eaton Publishing, Natick, Mass., 2000. Additionally, the term "patterned microarrays" refers to microarray substrates with a plurality of molecules non-randomly bound to its surface. As used herein, the term "optical detector" or "photodetector" refers to a device that generates an output signal when irradiated with optical energy. Thus, in its broadest sense the term optical detector system is taken to mean a device for converting energy from one form to another for the purpose of measurement of a physical quantity or for information transfer. Optical detectors include but are not limited to photomultipliers and photodiodes.

As used herein, the term "photomultiplier" or "photomultiplier tube" refers to optical detection components that convert incident photons into electrons via the photoelectric effect and secondary electron emission. The term photomultiplier tube is meant to include devices that contain separate dynodes for current multiplication as well as those devices which contain one or more channel electron multipliers.

As used herein, the term "photodiode" refers to a solid-state light detector type including, but not limited to PN, PIN, APD and CCD. As used herein, the terms "plate reader" in reference to a "detection device" refer to a device to detect the transmission of light through or reflection of light (i.e., polarized light or non-polarized light of specific wavelengths) from the surface of an assay, that for the purposes of the present invention the assay is a "chip" and "PCR chip" or a "glass slide" comprising a PCR assay or a "plate" such as a 96-well plate and the like. For example, a microtiter plate reader measures transmittance, absorbance, or reflectance through, in, or from each well of a multitest device such as a microtiter testing plate (e.g., MicroPlate.TM. testing plates) or a miniaturized testing card (e.g., MicroCard.TM. miniaturized testing cards).

As used herein, "thin layer" refers to a very thin deposition of a colloidal substance (phosphor, dielectric, silver) onto the ITO coated glass plate.

As used herein, "ITO" refers to an Indium Tin Oxide (In203:Sn02) A thin layer of indium oxide that has been doped with tin; transparent, conductive coating on glass plate.

As used herein, "phosphor" refers to a powder made of a material such as zinc sulfide, doped with either copper or manganese to achieve the emission colors when exposed to an electric field.

As used herein, "dielectric layer" refers to an insulating layer that serves to even out the electric field across the phosphor layer and prevents short circuits.

As used herein, "capacitor" refers to an electrical device that can store energy in the electric field between a pair of closely spaced conductors or 'plates'. When current is passed through the capacitor, electric charges of equal magnitude, but opposite sign, build up on each plate. As used herein, "electrode" refers to a plate of a capacitor, for example, one front electrode, such an electrode comprising as transparent ITO and one back electrode, such as an electrode comprising silver.

As used herein, "electronic power supply" refers to an electronic device that produces a particular DC voltage or current from a source of electricity such as a battery or wall outlet.

As used herein, "power adapter," "transformer," or "power supply" refer to an external power supply for laptop computers or portable or semi-portable electronic device As used herein, "AC adapter" refers to a rectifier to convert AC current to DC and a transformer to convert voltage from 120 down to 9, 12, 15 or whatever is required.

As used herein, "power supply" refers to an electrical system that converts AC current from the wall outlet into the DC currents required by the computer circuitry. A computer power supply typically generates multiple voltages. For example, 12 volts is used for drives, and either 3.3 or 5 volts is used for the electronic circuitry. As used herein, "external AC adaptor power brick" refers to an electronic device that produces AC current.

As used herein, "AC powered linear power supply" refers to a transformer to convert the voltage from the wall outlet to a lower voltage. An array of diodes called a diode bridge then rectifies the AC voltage to DC voltage. A low-pass filter smoothes out the voltage ripple that is left after the rectification. Finally a linear regulator converts the voltage to the desired output voltage, along with other possible features such as current limiting.

As used herein, "AC current" and "Alternating Current" and "AC" refers to a type of electrical current, the direction of which is reversed at regular intervals or cycles. In the United States, the standard is 120 reversals or 60 cycles per second.

As used herein, "DC current" and "Direct Current" and "DC" refers to a type of electricity transmission and distribution by which electricity flows in one direction through the conductor, usually relatively low voltage and high current. For typical 120 volt or 220 volt devices, DC must be converted to alternating current. As used herein, "battery" refers to a device that stores chemical energy and makes it available in an electrical form. Batteries comprise electrochemical devices such as one or more galvanic cells, fuel cells or flow cell, examples include, lead acid, nickel cadmium, nickel metal hydride, lithium ion, lithium polymer, CMOS battery and the like.

As used herein, "CMOS battery" refers to a battery that maintains the time, date, hard disk and other configuration settings in the CMOS memory.

As used herein, "frequency" refers to a number of oscillations (vibrations) in one second. Frequency/is the reciprocal of the time T taken to complete one cycle (the period), or MT. The frequency with which earth rotates is once per 24 hours. Frequency is usually expressed in units called hertz (Hz). Frequency is measured in terms "hertz" or "Hz" that refer to "oscillations per second" or "cycles per second." For example, "one hertz" or "1 Hz" is equal to one cycle per seconder example, "one kilohertz" or "kHz" is 1 ,000 Hz, and "one megahertz" or "MHz" is 1,000,000 Hz. For example, the alternating current in a wall outlet in the U.S. and Canada is 60Hz. Electromagnetic radiation is measured in kiloHertz (kHz), megahertz (MHz) and gigahertz (GHz).

(frequency is number of oscillations per second)

Frequency

As used herein, "inverter " or "rectifier" refer to a device that converts direct current electricity to alternating current either for stand-alone systems or to supply power to an electricity grid.

As used herein, "volt" and "V" refer to a unit of electrical force equal to that amount of electromotive force that will cause a steady current of one ampere to flow through a resistance of one ohm. As used herein, "voltage " refers to an amount of electromotive force, measured in volts, that exists between two points.

As used herein, "Ohm " refers to a measure of the electrical resistance of a material equal to the resistance of a circuit in which the potential difference of 1 volt produces a current of 1 ampere.

As used herein, "Ampere" and "amp" refers to a unit of electrical current or rate of flow of electrons, such that one volt across one ohm of resistance causes a current flow of one ampere.

As used herein, "watt" or "W" refer to a measure of power, i.e., Volts multiplied by Amps = Watts. Watt may also refer to a rate of energy transfer equivalent to one ampere under an electrical pressure of one volt, for examples, one watt equals 1/746 horsepower, or one joule per second, i.e., voltage x current =amperage.

As used herein, "Charge-Coupled Device" and "CCD" refers to an electronic memory that records the intensity of light as a variable charge. As used herein, "storage CCDs" refers to either a separate array (frame transfer) or individual photosites (interline transfer) coupled to each imaging photosite.

As used herein, "CMOS" or " Complementary-symmetry/metal-oxide semiconductor" refers to a both a particular style of digital circuitry design and the family of processes used to implement that circuitry on integrated circuits (chips). As used herein, "CMOS IMAGE SENSOR" refers to a "CMOS-based chip" that records intensities of light as variable charges similar to a CCD chip. In one embodiment, as CMOS chip use less power than a CCD chip.

As used herein, "optical signal" refers to any energy (e.g., photodetectable energy) from a sample (e.g., produced from a microarray that has one or more optically excited [i.e., by electromagnetic radiation] molecules bound to its surface).

As used herein, "filter" refers to a device or coating that preferentially allows light of a characteristic spectra to pass through it (e.g., the selective transmission of light beams).

"Polychromatic" and "broadband" as used herein, refer to a plurality of electromagnetic wavelengths emitted from a light source or sample. As used herein, the term "micromirror array" refers to a plurality of individual light reflecting surfaces that are addressable (e.g., electronically addressable in any combination), such that one or more individual micromirrors can be selectively tilted, as desired. As used herein, the terms "optical detector" and "photodetector" refers to a device that generates an output signal when exposed to optical energy. Thus, in its broadest sense, the term "optical detector system" refers devices for converting energy from one form to another for the purpose of measurement of a physical quantity and/or for information transfer. Optical detectors include but are not limited to photomultipliers and photodiodes, as well as fluorescence detectors.

As used herein, the term "TTL" stands for Transistor-Transistor Logic, a family of digital logic chips that comprise gates, flip/flops, counters etc. The family uses zero Volt and five Volt signals to represent logical "0" and "1 " respectively.

As used herein, the term "dynamic range" refers to the range of input energy over which a detector and data acquisition system is useful. This range encompasses the lowest level signal that is distinguishable from noise to the highest level that can be detected without distortion or saturation.

As used herein, the term "noise" in its broadest sense refers to any undesired disturbances (i.e., signal not directly resulting from the intended detected event) within the frequency band of interest. One example of noise is the summation of unwanted or disturbing energy introduced into a system from man-made and natural sources. In another example, noise may distort a signal such that the information carried by the signal becomes degraded or less reliable.

As used herein, the term "signal-to-noise ratio" refers the ability to resolve true signal from the noise of a system. One example of computing a signal-to-noise ratio is by taking the ratio of levels of the desired signal to the level of noise present with the signal. In preferred embodiments of the present invention, phenomena affecting signal- to-noise ratio include, but are not limited to, detector noise, system noise, and background artifacts. As used herein, the term "detector noise" refers to undesired disturbances (i.e., signal not directly resulting from the intended detected energy) that originate within the detector. Detector noise includes dark current noise and shot noise. Dark current noise in an optical detector system results from the various thermal emissions from the photodetector. Shot noise in an optical system is the product of the fundamental particle nature (i.e., Poisson-distributed energy fluctuations) of incident photons as they pass through the photodetector.

As used herein, the term "system noise" refers to undesired disturbances that originate within the system. System noise includes, but is not limited to noise contributions from signal amplifiers, electromagnetic noise that is inadvertently coupled into the signal path, and fluctuations in the power applied to certain components (e.g., a light source)

As used herein, the term "background artifacts" include signal components caused by undesired optical emissions from the microarray. These artifacts arise from a number of sources, including: non-specific hybridization, intrinsic fluorescence of the substrate and/or reagents, incompletely attenuated fluorescent excitation light, and stray ambient light. In some embodiments, the noise of an optical detector system is determined by measuring the noise of the background region and noise of the signal from the microarray feature.

As used herein, the term "processor" refers to a device that performs a set of steps according to a program (e.g., a digital computer). Processors, for example, include Central Processing Units ("CPUs"), electronic devices, and systems for receiving, transmitting, storing and/or manipulating digital data under programmed control.

As used herein, the terms "memory device," and "computer memory" refer to any data storage device that is readable by a computer, including, but not limited to, random access memory, hard disks, magnetic (e.g., floppy) disks, zip disks, compact discs, DVDs, magnetic tape, and the like.

The term "gene" refers to a nucleic acid (e.g., DNA) sequence that comprises coding sequences necessary for the production of a polypeptide or precursor. It is intended that the term encompass polypeptides encoded by a full length coding sequence, as well as any portion of the coding sequence, so long as the desired activity and/or functional properties (e.g., enzymatic activity, ligand binding, etc.) of the full-length or fragmented polypeptide are retained. The term also encompasses the coding region of a structural gene and the sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb on either end such that the gene corresponds to the length of the full-length mRNA. The sequences that are located 5' of the coding region and which are present on the mRNA are referred to as "5' untranslated sequences." The sequences that are located 3' (i.e., "downstream") of the coding region and that are present on the mRNA are referred to as "3' untranslated sequences." The term "gene" encompasses both cDNA and genomic forms of a gene. A genomic form of a genetic clone contains the coding region interrupted with non-coding sequences termed "introns" or "intervening regions" or "intervening sequences." Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or "spliced out" from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide. Where "amino acid sequence" is recited herein to refer to an amino acid sequence of a naturally occurring protein molecule, "amino acid sequence" and like terms, such as "polypeptide" and "protein" are not meant to limit the amino acid sequence to the complete, native amino acid sequence associated with the recited protein molecule. In addition to containing introns, genomic forms of a gene may also include sequences located on both the 5' and 3' end of the sequences that are present on the RNA transcript. These sequences are referred to as "flanking" sequences or regions (these flanking sequences are located 5' or 3' to the non-translated sequences present on the mRNA transcript). The 5' flanking region may contain regulatory sequences such as promoters and enhancers that control or influence the transcription of the gene. The 3' flanking region may contain sequences that direct the termination of transcription, post- transcriptional cleavage and polyadenylation.

As used herein, the terms "nucleic acid molecule encoding," "DNA sequence encoding," and "DNA encoding" refer to the order or sequence of deoxyribonucleotides along a strand of deoxyribonucleic acid. The order of these deoxyribonucleotides determines the order of amino acids along the polypeptide (protein) chain. The DNA sequence thus codes for the amino acid sequence. DNA molecules are said to have "5¹ ends" and "3¹ ends" because mononucleotides are reacted to make oligonucleotides or polynucleotides in a manner such that the 5' phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of its neighbor in one direction via a phosphodiester linkage. Therefore, an end of an oligonucleotide or polynucleotide, referred to as the "5' end" if its 5' phosphate is not linked to the 3' oxygen of a mononucleotide pentose ring and as the "3' end" if its 3' oxygen is not linked to a 5' phosphate of a subsequent mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide or polynucleotide, also may be said to have 5' and 3¹ ends. In either a linear or circular DNA molecule, discrete elements are referred to as being "upstream" or 5' of the "downstream" or 3' elements. This terminology reflects the fact that transcription proceeds in a 5' to 3' fashion along the DNA strand. The promoter and enhancer elements that direct transcription of a linked gene are generally located 5' or upstream of the coding region. However, enhancer elements can exert their effect even when located 3' of the promoter element and the coding region. Transcription termination and polyadenylation signals are located 3' or downstream of the coding region.

As used herein, the terms "an oligonucleotide having a nucleotide sequence encoding a gene" and "polynucleotide having a nucleotide sequence encoding a gene," means a nucleic acid sequence comprising the coding region of a gene or, in other words, the nucleic acid sequence that encodes a gene product. The coding region may be present in either a cDNA, genomic DNA, or RNA form. When present in a DNA form, the oligonucleotide or polynucleotide may be single-stranded (i.e., the sense strand) or double-stranded. Suitable control elements such as enhancers/promoters, splice junctions, polyadenylation signals, etc. may be placed in close proximity to the coding region of the gene if needed to permit proper initiation of transcription and/or correct processing of the primary RNA transcript.

As used herein, the term "regulatory element" refers to a genetic element that controls some aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element that facilitates the initiation of transcription of an operably linked coding region. Other regulatory elements include splicing signals, polyadenylation signals, termination signals, etc. As used herein, the terms "complementary" and "complementarity" are used in reference to polynucleotides (i.e., a sequence of nucleotides) related by the base-pairing rules. For example, for the sequence "A-G-T," is complementary to the sequence "T-C- A." Complementarity may be "partial," in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be "complete" or "total" complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification and hybridization reactions, as well as detection methods that depend upon binding between nucleic acids.

The art knows well that numerous equivalent conditions may be employed to comprise low stringency conditions; factors such as the length and nature (DNA, RNA, base composition) of the probe and nature of the target (DNA, RNA, base composition, present in solution or immobilized, etc.) and the concentration of the salts and other components (e.g., the presence or absence of formamide, dextran sulfate, polyethylene glycol) are considered and the hybridization solution may be varied to generate conditions of low stringency hybridization different from, but equivalent to, the above listed conditions. In addition, the art knows conditions that promote hybridization under conditions of high stringency (e.g., increasing the temperature of the hybridization and/or wash steps, the use of formamide in the hybridization solution, etc.).

When used in reference to a double-stranded nucleic acid sequence such as a cDNA or genomic clone, the term "substantially homologous" refers to any probe that can hybridize to either or both strands of the double-stranded nucleic acid sequence under conditions of low stringency as described above. A gene may produce multiple RNA species that are generated by differential splicing of the primary RNA transcript. cDNAs that are splice variants of the same gene will contain regions of sequence identity or complete homology (representing the presence of the same exon or portion of the same exon on both cDNAs) and regions of complete non-identity (for example, representing the presence of exon "A" on cDNA 1 wherein cDNA 2 contains exon "B" instead). Because the two cDNAs contain regions of sequence identity they will both hybridize to a probe derived from the entire gene or portions of the gene containing sequences found on both cDNAs; the two splice variants are therefore substantially homologous to such a probe and to each other.

When used in reference to a single-stranded nucleic acid sequence, the term "substantially homologous" refers to any probe that can hybridize (i.e., it is the complement of) the single-stranded nucleic acid sequence under conditions of low stringency as described above.

As used herein, the term "hybridization" is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, the T_m of the formed hybrid, and the G:C ratio within the nucleic acids.

As used herein, the term "T_m" is used in reference to the "melting temperature." The melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. The equation for calculating the T_m of nucleic acids is well known in the art. As indicated by standard references, a simple estimate of the T_m value may be calculated by the equation: T_n, = 81.5 + 0.41(% G + C), when a nucleic acid is in aqueous solution at 1 M NaCl (See e.g., Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization [1985]). Other references include more sophisticated computations that take structural as well as sequence characteristics into account for the calculation of T_m.

As used herein the term "stringency" is used in reference to the conditions of temperature, ionic strength, and the presence of other compounds such as organic solvents, under which nucleic acid hybridizations are conducted. Those skilled in the art will recognize that "stringency" conditions may be altered by varying the parameters just described either individually or in concert. With "high stringency" conditions, nucleic acid base pairing will occur only between nucleic acid fragments that have a high frequency of complementary base sequences (e.g., hybridization under "high stringency" conditions may occur between homologs with about 85-100% identity, preferably about 70-100% identity). With medium stringency conditions, nucleic acid base pairing will occur between nucleic acids with an intermediate frequency of complementary base sequences (e.g., hybridization under "medium stringency" conditions may occur between homologs with about 50-70% identity). Thus, conditions of 'weak" or "low" stringency are often required with nucleic acids that are derived from organisms that are genetically diverse, as the frequency of complementary sequences is usually less.

The term "selectively hybridize" means that, for particular identical sequences, a substantial portion of the particular identical sequences hybridize to a given desired sequence or sequences, and a substantial portion of the particular identical sequences do not hybridize to other undesired sequences. A "substantial portion of the particular identical sequences" in each instance refers to a portion of the total number of the particular identical sequences, and it does not refer to a portion of an individual particular identical sequence. In certain embodiments, "a substantial portion of the particular identical sequences" means at least 70% of the particular identical sequences. In certain embodiments, "a substantial portion of the particular identical sequences" means at least 80% of the particular identical sequences. In certain embodiments, "a substantial portion of the particular identical sequences" means at least 90% of the particular identical sequences. In certain embodiments, "a substantial portion of the particular identical sequences" means at least 95% of the particular identical sequences.

"Amplification" is a special case of nucleic acid replication involving template specificity. It is to be contrasted with non-specific template replication (i.e., replication that is template-dependent but not dependent on a specific template). Template specificity is here distinguished from fidelity of replication (i.e., synthesis of the proper polynucleotide sequence) and nucleotide (ribo- or deoxyribo-) specificity. Template specificity is frequently described in terms of "target" specificity. Target sequences are "targets" in the sense that they are sought to be sorted out from other nucleic acid. Amplification techniques have been designed primarily for this sorting out. Template specificity is achieved in most amplification techniques by the choice of enzyme. Amplification enzymes are enzymes that, under conditions they are used, will process only specific sequences of nucleic acid in a heterogeneous mixture of nucleic acid. For example, in the case of Q-replicase, MDV-I RNA is the specific template for the replicase (Kacian el al, Proc. Natl. Acad. Sci. USA, 69:3038 [1972], herein incorporated by reference). Similarly, in the case of T7 RNA polymerase, this amplification enzyme has a stringent specificity for its own promoters (Chamberlin et al, Nature, 228:227 [1970], herein incorporated by reference). In the case of T4 DNA ligase, the enzyme will not ligate the two oligonucleotides or polynucleotides, where there is a mismatch between the oligonucleotide or polynucleotide substrate and the template at the ligation junction (Wu and Wallace, Genomics, 4:560 [1989], herein incorporated by reference). Finally, Taq and Pfu polymerases, by virtue of their ability to function at high temperature, are found to display high specificity for the sequences bounded and thus defined by the primers; the high temperature results in thermodynamic conditions that favor primer hybridization with the target sequences and not hybridization with non- target sequences (H. A. Erlich (ed.), PCR Technology, Stockton Press [1989], herein incorporated by reference).

As used herein, the term "amplifiable nucleic acid" is used in reference to nucleic acids that may be amplified by any amplification method. It is contemplated that "amplifiable nucleic acid" will usually comprise "sample template."

As used herein, the term "sample template" refers to nucleic acid originating from a sample that is analyzed for the presence of "target" (defined below). In contrast,

"background template" is used in reference to nucleic acid other than sample template that may or may not be present in a sample. Background template is most often inadvertent. It may be the result of carryover, or it may be due to the presence of nucleic acid contaminants sought to be purified away from the sample. For example, nucleic acids from organisms other than those to be detected may be present as background in a test sample.

As used herein, the term "primer" refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, {i.e., in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact lengths of the primers will depend on many factors, including temperature, source of primer and the use of the method.

As used herein, the term "probe" refers to a molecule (e.g., an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, recombinantly or by PCR amplification), that is capable of hybridizing to another molecule of interest (e.g., another oligonucleotide). When probes are oligonucleotides they may be single-stranded or double-stranded. Probes are useful in the detection, identification and isolation of particular targets (e.g., gene sequences). In some embodiments, it is contemplated that probes used in the present invention are labeled with any "reporter molecule," so that is detectable in any detection system, including, but not limited to enzyme (e.g., ELISA, as well as enzyme-based histochemical assays), fluorescent, radioactive, and luminescent systems. It is not intended that the present invention be limited to any particular label. With respect to microarrays, the term probe is used to refer to any hybridizable material that is affixed to the microarray for the purpose of detecting "target" sequences in the analyte.

As used herein "probe element" and "probe site" refer to a plurality of probe molecules (e.g., identical probe molecules) affixed to a microarray substrate. Probe elements containing different characteristic molecules are typically arranged in a two- dimensional array, for example, by microfluidic spotting techniques or by patterned photolithographic synthesis, etc.

As used herein, the term "target," when used in reference to hybridization assays, refers to the molecules (e.g., nucleic acid) to be detected. Thus, the "target" is sought to be sorted out from other molecules (e.g., nucleic acid sequences) or is to be identified as being present in a sample through its specific interaction (e.g., hybridization) with another agent (e.g., a probe oligonucleotide). A "segment" is defined as a region of nucleic acid within the target sequence.

A "target nucleic acid sequence" is a sequence in a sample that is not a known control gene that is added to the sample. In certain embodiments, a target nucleic acid sequence serves as a template for amplification in a PCR reaction. In certain embodiments, a target nucleic acid sequence is a portion of a larger nucleic acid sequence. In certain embodiments, a target nucleic acid sequence is a portion of a gene.

As used herein, the term "polymerase chain reaction" ("PCR") refers to the methods described in U.S. Patent Nos. 4,683,195, 4,683,202, and 4,965,188, hereby incorporated by reference, that describe a method for increasing the concentration of a segment of a target sequence in a mixture of genomic DNA without cloning or purification. This process for amplifying the target sequence consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double stranded target sequence. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule. Following annealing, the primers are extended with a polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer annealing, and polymerase extension can be repeated many times (i.e., denaturation, annealing and extension constitute one "cycle"; there can be numerous "cycles") to obtain a high concentration of an amplified segment of the desired target sequence. The length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as the "polymerase chain reaction" (hereinafter "PCR"). Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be "PCR amplified." In addition to genomic DNA, any oligonucleotide or polynucleotide sequence can be amplified with the appropriate set of primer molecules. In particular, the amplified segments created by the PCR process itself are, themselves, efficient templates for subsequent PCR amplifications. With PCR, it is possible to amplify a single copy of a specific target sequence in genomic DNA to a level detectable by the device and systems of the present invention. As used herein, the terms "PCR product," "PCR fragment," and "amplification product" refer to the resultant mixture of compounds after two or more cycles of the PCR steps of denaturation, annealing and extension are complete. These terms encompass the case where there has been amplification of one or more segments of one or more target sequences.

As used herein, the term "amplification reagents" refers to those reagents (deoxyribonucleotide triphosphates, buffer, etc.), needed for amplification except for primers, nucleic acid template, and the amplification enzyme. Typically, amplification reagents along with other reaction components are placed and contained in a reaction vessel (test tube, microwell, etc.).

As used herein, the terms "reverse-transcriptase" and "RT-PCR" refer to a type of PCR where the starting material is mRNA. The starting mRNA is enzymatically converted to complementary DNA or "cDNA" using a reverse transcriptase enzyme. The cDNA is then used as a "template" for a "PCR" reaction.

A "multiplex amplification reaction" is an amplification reaction in which two or more target nucleic acid sequences are amplified in the same reaction. A "multiplex polymerase chain reaction" or "multiplex PCR" is a polymerase chain reaction method in which two or more target nucleic acid sequences are amplified in the same reaction.

A "singleplex amplification reaction" is an amplification reaction in which only one target nucleic acid sequence is amplified in the reaction. A "singleplex polymerase chain reaction" or "singleplex PCR" is a polymerase chain reaction method in which only one target nucleic acid sequence is amplified in the reaction.

As used herein, the terms "restriction endonucleases" and "restriction enzymes" refer to bacterial enzymes, each of which cut double-stranded DNA at or near a specific nucleotide sequence.

As used herein, the term "recombinant DNA molecule" as used herein refers to a DNA molecule that is comprised of segments of DNA joined together by means of molecular biological techniques.

As used herein, the term "antisense" is used in reference to RNA sequences that are complementary to a specific RNA sequence (e.g., mRNA). Included within this definition are antisense RNA ("asRNA") molecules involved in gene regulation by bacteria. Antisense RNA may be produced by any method, including synthesis by splicing the gene(s) of interest in a reverse orientation to a viral promoter that permits the synthesis of a coding strand. Once introduced into an embryo, this transcribed strand combines with natural mRNA produced by the embryo to form duplexes. These duplexes then block either the further transcription of the mRNA or its translation. In this manner, mutant phenotypes may be generated. The term "antisense strand" is used in reference to a nucleic acid strand that is complementary to the "sense" strand. The designation (-) (i.e., "negative") is sometimes used in reference to the antisense strand, with the designation (+) sometimes used in reference to the sense (i.e., "positive") strand.

The term "isolated" when used in relation to a nucleic acid, as in "an isolated oligonucleotide" or "isolated polynucleotide" refers to a nucleic acid sequence that is identified and separated from at least one contaminant nucleic acid with which it is ordinarily associated in its natural source. Isolated nucleic acid is present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids are nucleic acids such as DNA and RNA found in the state they exist in nature. For example, a given DNA sequence (e.g., a gene) is found on the host cell genome in proximity to neighboring genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are found in the cell as a mixture with numerous other mRNAs that encode a multitude of proteins. The isolated nucleic acid, oligonucleotide, or polynucleotide may be present in single-stranded or double-stranded form. When an isolated nucleic acid, oligonucleotide or polynucleotide is to be utilized to express a protein, the oligonucleotide or polynucleotide will contain at a minimum the sense or coding strand (i.e., the oligonucleotide or polynucleotide may single-stranded), but may contain both the sense and anti-sense strands (i.e., the oligonucleotide or polynucleotide may be double-stranded).

As used herein the term "coding region" when used in reference to a structural gene refers to the nucleotide sequences that encode the amino acids found in the nascent polypeptide as a result of translation of a mRNA molecule. The coding region is bounded, in eukaryotes, on the 5' side by the nucleotide triplet "ATG" that encodes the initiator methionine and on the 3' side by one of the three triplets which specify stop i codons (i.e., TA, TAG, TGA). As used herein, the terms "purified" and "to purify" refer to the removal of contaminants from a sample. The term "recombinant DNA molecule" as used herein refers to a DNA molecule that is comprised of segments of DNA joined together by means of molecular biological techniques.

As used herein the term "portion" when in reference to a nucleotide sequence (as in "a portion of a given nucleotide sequence") refers to fragments of that sequence. The fragments may range in size from four nucleotides to the entire nucleotide sequence minus one nucleotide.

The terms "recombinant protein" and "recombinant polypeptide" as used herein refer to a protein molecule that are expressed from a recombinant DNA molecule. As used herein the term "biologically active polypeptide" refers to any polypeptide which maintains a desired biological activity.

As used herein the term "portion" when in reference to a protein (as in "a portion of a given protein") refers to fragments of that protein. The fragments may range in size from four amino acid residues to the entire amino acid sequence minus one amino acid. As used herein, the terms "microbe" and "microbial" refer to microorganisms. In particularly preferred embodiments, the microbes identified using the present invention are bacteria (i.e., eubacteria and archaea). However, it is not intended that the present invention be limited to bacteria, as other microorganisms are also encompassed within this definition, including fungi, viruses, and parasites (e.g., protozoans and helminths). As used herein, the term "reference DNA" refers to DNA that is obtained from a known organism (i.e., a reference strain). In some embodiments of the invention, the reference DNA comprises random genome fragments. In particularly preferred embodiments, the genome fragments are of approximately 1 to 2 kb in size. Thus, in preferred embodiments, the reference DNA of the present invention comprises mixtures of genomes from multiple reference strains.

As used herein, the term "multiple reference strains" refers to the use of more than one reference strains in an analysis. In some embodiments, multiple reference strains within the same species are used, while in other embodiments, "multiple reference strains" refers to the use of multiple species within the same genus, and in still further embodiments, the term refers to the use of multiple species within different genera. As used herein, the terms "test DNA" and "sample DNA" refer to the DNA to be analyzed using the method of the present invention. In preferred embodiments, this test DNA is tested in the competitive hybridization methods of the present invention, in which reference DNA(s) from multiple reference strains is/are used. The terms "sample" and "specimen" in the present specification and claims are used in their broadest sense. On the one hand, they are meant to include a specimen or culture. On the other hand, they are meant to include both biological and environmental samples. These terms encompasses all types of samples obtained from humans and other animals, including but not limited to, body fluids such as urine, blood, fecal matter, cerebrospinal fluid (CSF), semen, and saliva, as well as solid tissue. These terms also refers to swabs and other sampling devices that are commonly used to obtain samples for culture of microorganisms. Biological samples may be animal, including human, fluid or tissue, food products and ingredients such as dairy items, vegetables, meat and meat byproducts, and waste. Environmental samples include environmental material such as surface matter, soil, water, and industrial samples, as well as samples obtained from food and dairy processing instruments, apparatus, equipment, disposable, and non-disposable items. These examples are not to be construed as limiting the sample types applicable to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to the design and manufacture of microarrays for use in detecting multiple organisms in a sample in a high throughput manner. In particular, the present invention relates to the detection of waterborne pathogens in a sample of water. Preferably, the microarrays comprise nucleic acids probes for multiple target organisms. The probes are selected by generating a maximum number of nonoverlapping probes of a predetermined length from a gene sequence derived from the target organism, comparing the sequence of each nonoverlapping probe to sequences from a plurality of database microorganisms, identifying at least one subregion in the gene sequence derived from the target microorganism, wherein the subregion contains exact matches to less than a predetermined number of the database microorganisms; and generating a set of overlapping probes of a predetermined length for said at least one subregion.

High throughput molecular tools for the detection and identification of pathogens are in increasing demand for numerous application including biodefense, food and water safety. The present invention embodies, in part, an in situ synthesized oligonucleotide probe biochip developed for the detection of 12 pathogens. A redundant set of 791 probes was selected targeting 47 virulence and marker genes (VMGs). Overall, 95% of the probes behaved as predicted based on hybridizations with a composite sample of all VMGs. After experimental evaluation, 673 probes (85% of the initial probe set) were selected for the detection of VMGs in environmental samples. A combination of multiplex PCR and the microarray allowed the detection of the pathogens at a relative abundance level between 0.1% and 0.01% for most of the VMGs. In-depth analysis of hybridization patterns showed that optimal selectivity is achieved with probes with a hybridization free energy of -19.3 kcal/mole and a G+C content of 47.2%. It was calculated that for an optimal probe set with above-described characteristics, 16 probes are needed to obtain calls with a type I and II errors of less than 0.01. This study demonstrates that microarrays with optimized and redundant probe sets, combined with statistical evaluation of presence/absence calls, are a powerful tool for environmental analysis. As an exemplary embodiment, the present invention encompasses a microarray

(VMG biochip) using an in situ synthesized microfluidic platform targeting 93 virulence and marker genes (VMGs) in the following pathogens: Aeromonas hydrophila, Campylobacter jejuni, Clostridium perfringens, Helicobacter pylori, Legionella pneumophila, Listeria monocytogenes Staphylococcus aureus, Salmonella, Pseudomonas aeruginosa, Vibrio cholerae, Vibrio parahaemolyticus, Yersinia enter ocolitica, E. faecalis, S. enter ica,, C. parvum, G. intestinal is, K. pneumoniae, E. coli. Each VMG was targeted by 17 probes on average. Many VMGs are targeted in more than one region. This design allows for redundancy in organism detection by targeting more than one VMG in an organism and redundancy in VMG detection by targeting VMGs in more than one region. In preferred embodiments, the virulence marker genes are selected from the genes listed in the table listed in Fig. 8. Fig. 8 also provides the number of probes for each gene on an exemplary chip. Fig. 9 presents provides a description of the which targets were amplified in multiplex PCR reactions and the specificity and sensitivity of the biochip design.

The biochips of the present invention were validated by spiking DNA of 12 pathogens into environmental background DNA extracted from river water, tertiary effluent, and tap water separately. The VMG targets were enriched via multiplex PCR to improve the detection limit, labeled, and allowed to hybridize on the biochip. The chip allowed detection 1 1 of the 12 pathogens presented to the biochip in environmental backgrounds. The targets of the undetected organism had a relatively low G+C content. Over all the VMG biochip is robust and could be utilized for pathogen detection in environmental samples.

It is contemplated that the methods of the present invention result in highly reliable pathogen detection. This reliability is accomplished by increasing the redundancy of the detection approach. In preferred embodiments, redundancy is achieved by using detections methods using probes to multiple genes from a single organism (e.g., a pathogen) and multiple amplicons per gene as well as using multiple probes per each Amplicon or combinations thereof. In some embodiments, high sensitivity is obtained by coupling two rounds of amplification. For example, the first round can be multiplex PCR and the second round can be isomthermal Klenow amplification.

A. Probe Design and Selection

Traditional methods of probe design use heuristic rules that were developed for solution-based hybridization and are not fully applicable to hybridizations performed on solid substrates. Convention probe design also fails to consider target dangling-end influence on hybridization behavior. As a result, conventional probe design can erroneously remove probes that will behave well on solid substrates, fail to remove probes that compliment problematic target fragments, and can reduce the size of probe sets. The present invention provides a new strategy for probe design and validation of probes for use on high-thoughput microarray platforms. The inventive strategy takes advantage of the fact that the availability of specific probes (i.e., those probes that hybridize only to the target of interest and do not cross-hybridize with non-targets) varies along the target sequence.

As a non-limiting example, 16S rDNA probes were designed with the following steps: 1) the number of exact matches in the Ribosomal Database Project II (RDP-II) database maintained at Michigan State University (RDP Release 9), were determined for every non-overlapping 20-mer probe possible in a 16S rDNA gene for an organism of interest (generating about 70 - 75 probes for the target sequence) and 2) generating additional probes by overlapping 19 bases in the sub-region of 16S rDNA with less than about 10 to 50 exact matches. The probes were validated using targets with specified dangling-ends for 18-mer probes. This demonstrated that the length and composition of target dangling-ends can significantly influence sensitivity and cross hybridization in reference and community assays can influence confidence in probe validation.

The specificity of the probes was examined using a single organism (Burkholderia xenovorans LB400), which was hybridized with a complex sample of DNA extracted from an anaerobic bioreactor. Since LB400 is not present in the bioreactor sample, this allowed evaluation of the specificity and sensitivity of LB400 probes. Results showed that specific signal increased as the conservation of probe sequences decreased (as determined by number of exact matches). Over 77% of perfect match probes displayed specificity, sensitivity and thermal melting curve characteristics in complex community samples, demonstrating that the extended probe design of the present invention is effective for retaining a larger number of high quality probes. Furthermore, portions of the sub-region yield greater specificity with as little as 1 bp difference and provide numerous probes for further evaluation. Additionally, amplified 16S rDNA provides enough target DNA to duplex with >900 perfect match probes. Accordingly, in some preferred embodiments, the present invention provides a microarray comprising nucleic acids probes for a target organism, wherein the probes are selected by generating a maximum number of nonoverlapping probes of a predetermined length from a gene sequence derived from the target organism, comparing the sequence of each nonoverlapping probe to sequences from a plurality of database microorganisms, identifying at least one subregion in the gene sequence derived from a target microorganism, wherein the at least one subregion contains exact matches to less than a predetermined number of the database microorganisms; and generating a set of overlapping probes of a predetermined length for the at least one subregion.

In some preferred embodiments, the number of exact matches is determined for every non-overlapping probe of a predetermined length possible in a gene sequence from an organism of interest. The predetermined length can be varied and can range from 10 bases to about 20, 30, 40, 50, 60, 70, 80, 90 or 100 bases or more. In preferred embodiments, predetermined length is about 20 bases. The number of exact matches can be obtained for each probe using Blast tools and public sequence databases (i.e., database sequences for database organisms). This data is then used to identify sub-regions of the genome of the organism of interest that have exact matches to less than a predetermined number of database organisms. The number of database organisms to which there are exact matches can vary from about 1 organism to about 50 organisms, preferably from about 5 to about 30 organisms, and more preferably from about 10 to about 20 organisms. In some preferred embodiments, the sub-regions range from 10 to 300 bases in length, and are more preferably from about 10 to about 20, 50, 75, 100, 150, 200, 250 or 300 bases in length. Panel A of Figure 1 provides a graphical depiction of this analysis for the 16s rDNA sequence.

Additional probes are then generated by overlapping sequences of a predetermined number of bases in the identified sub-regions. In preferred embodiments, the sub-regions range from 10 to 300 bases in length, and are more preferably from about 10 to about 20, 50, 75, 100, 150, 200, 250 or 300 bases in length. The sequences used for overlapping the predetermined region are preferably from about 10 to about 50 bases in length, and in some particularly preferred embodiments are from 10 to 30 bases in length, and most preferably about 19 bases in length. Since some portions of the sub-region yield greater specificity with as little as 1 base pair difference, numerous probes are identified. See, Panel A, Figure 1. In some embodiments, the probes are then validated using PCR to ensure sufficient presence of the target. In some embodiments, the optimal length and composition of target dangling-end is determined to minimize hybridization between targets, thus reducing ambiguous hybridization behavior. In some preferred embodiments, targets are fragmented to a predetermined length to eliminate cross hybridization in a community sample and to reduce target-target dangling end hybridization.

In certain embodiments, a probe may include Watson-Crick bases or modified bases. Modified bases include, but are not limited to, the AEGIS bases (from Eragen Biosciences), which have been described, e.g., in U.S. Pat. Nos. 5,432,272; 5,965,364; and 6,001 ,983. In certain embodiments, bases are joined by a natural phosphodiester bond or a different chemical linkage. Different chemical linkages include, but are not limited to, a peptide bond or an LNA linkage, which is described, e.g., in published PCT applications WO 00/56748; and WO 00/66604. In some embodiments, the means for generating the polynucleotide probes, for example, for use on a microarray, is by synthesis of synthetic polynucleotides or oligonucleotides, e.g., using N-phosphonate or phosphoramidite chemistries (Froehler et al., 1986, Nucleic Acid Res. 14:5399-5407; McBride et al., 1983, Tetrahedron Lett. 24:246-248). Other methods for synthesizing probes include those described in U.S> pat. Nos. 6,965,040 and 6,426,184, both of which are incorporated herein by reference.

Synthetic sequences are typically between about 15 and about 600 bases in length, more typically between about 20 and about 100 bases, most preferably between about 40 and about 70 bases in length.

B. Amplification Target Nucleic Acid Sequences

In some embodiments, the present invention is directed to the detection of target nucleic acid sequences from a target organism. In certain embodiments, target nucleic acid sequences include RNA and DNA. Exemplary RNA target sequences include, but are not limited to, mRNA, rRNA, tRNA, snRNA, viral RNA, and variants of RNA, such as splicing variants. Exemplary DNA target sequences include, but are not limited to, genomic DNA, plasmid DNA, phage DNA, nucleolar DNA, mitochondrial DNA, chloroplast DNA, cDNA, synthetic DNA, yeast artificial chromosomal DNA ("YAC"), bacterial artificial chromosome DNA ("BAC"), other extrachromosomal DNA, and primer extension products. Target nucleic acid sequences also include, but are not limited to, analogs of both RNA and DNA. Exemplary nucleic acid analogs include, but are not limited to, locked nucleic acids ("LNAs"), peptide nucleic acids ("PNAs"), 8-aza-7- deazaguanine ("PPG's"), and other nucleic acid analogs. Exemplary target nucleic acid sequences include, but are not limited to, chimeras of RNA and DNA.

A variety of methods are available for obtaining a target nucleic acid sequence. When the nucleic acid target is obtained through isolation from a biological matrix, certain isolation techniques include, but are not limited to, (1) organic extraction followed by ethanol precipitation, e.g., using a phenol/chloroform organic reagent (e.g., Ausubel et al., eds., Current Protocols in Molecular Biology Volume 1, Chapter 2, Section I, John Wiley & Sons, New York (1993), herein incorporated by reference), in certain embodiments, using an automated nucleic acid extractor, e.g., the Model 341 DNA Extractor available from Applied Biosystems (Foster City, Calif); (2) stationary phase adsorption methods (e.g., Boom et al., U.S. Pat. No. 5,234,809; Walsh et al., Biotechniques 10(4): 506-513 (1991), all of which are herein incorporated by reference); and (3) salt-induced nucleic acid precipitation methods (e.g., Miller et al., Nucleic Acids Research, 16(3): 9-10 (1988), herein incorporated by reference), such precipitation methods being typically referred to as "salting-out" methods. In certain embodiments, the above isolation methods may be preceded by an enzyme digestion step to help eliminate unwanted protein from the sample, e.g., digestion with proteinase K, or other like proteases. See, e.g., U.S. patent application Ser. No. 09/724,613, herein incorporated by reference. In certain embodiments, a target nucleic acid sequence may be derived from any living, or once living, organism, including but not limited to, a prokaryote, a eukaryote, a plant, an animal, and a virus. In certain embodiments, a target nucleic acid sequence is derived from a human. In certain embodiments, the target nucleic acid sequence may originate from a nucleus of a cell, e.g., genomic DNA, or may be extranuclear nucleic acid, e.g., originate from a plasmid, a mitochondrial nucleic acid, from various RNAs, and the like. In certain embodiments, if the sequence from the organism is RNA, it may be reverse-transcribed into a cDNA target nucleic acid sequence. In certain embodiments, the target nucleic acid sequence may be present in a double-stranded or single-stranded form. In some embodiments, target nucleic acid sequences are amplified. In certain embodiments, multiple target nucleic acid sequences can be amplified in the same reaction (e.g., in multiplex amplification reactions). In certain embodiments, more than one different multiplex amplification reaction is performed. In certain embodiments, 5 to 10 different multiplex amplification reactions are performed. In certain embodiments, 10 to 25 different multiplex amplification reactions are performed. In certain embodiments, 25 to 50 different multiplex amplification reactions are performed. In certain embodiments, greater than 50 different multiplex amplification reactions are performed. In certain embodiments, a sufficient number of different amplification reactions can be performed such that all of the target nucleic acid sequences together represent all of the genes in a genome. In certain embodiments, the genome may be derived from any living, or once living organism including but not limited to, a prokaryote, a eukaryote, a plant, an animal, and a virus. In certain embodiments, the genome is prokaryotic. In certain embodiments, a sufficient number of different amplification reactions can be performed such that all of the target nucleic acid sequences together represent most of the genes in a genome. In certain embodiments, a sufficient number of different amplification reactions can be performed such that all of the target nucleic acid sequences together represent all of the nucleic acids in a transcriptome. In certain embodiments, a sufficient number of different amplification reactions can be performed such that all of the target nucleic acid sequences together represent most of the nucleic acids in a transcriptome. The term "transcriptome" refers to the activated genes, mRNAs, and/or transcripts found in a particular tissue at a particular time.

Exemplary target nucleic acid sequences include, but are not limited to, amplification products, ligation products, transcription products, reverse transcription products, primer extension products, methylated DNA, and cleavage products. Exemplary amplification products include, but are not limited to, PCR and isothermal products.

Different target nucleic acid sequences may be different portions of a single contiguous nucleic acid or may be on different nucleic acids. Different portions of a single contiguous nucleic acid may or may not overlap.

In certain embodiments, nucleic acids in a sample may be subjected to a cleavage procedure. In certain embodiments, such cleavage products may be target nucleic acid sequences. In certain embodiments, a target nucleic acid sequence is derived from a crude cell lysate. Examples of target nucleic acid sequences include, but are not limited to, nucleic acids from buccal swabs, crude bacterial lysates, blood, skin, semen, hair, bone, mucus, saliva, cell cultures, and tissue biopsies. In some embodiments, one or more target nucleic acid samples are derived from communities of microorganisms. In some embodiments, the community of microorganisms is present in a public water source, such as a well, reservoir, river, stream, or lake.

In certain embodiments, target nucleic acid sequences are obtained from a cell, cell line, tissue, or organism that has undergone a treatment. In certain embodiments, the treatment results in the up-regulation or down-regulation of certain target nucleic acid sequences in treated cells, cell lines, tissues, or organisms.

In certain embodiments, a target nucleic acid sequence is obtained from a single cell. In certain embodiments, a target nucleic acid sequence is obtained from tens of cells. In certain embodiments, a target nucleic acid sequence is extracted from hundreds of cells or more. In certain embodiments, a target nucleic acid sequence is extracted from cells of a single organism. In certain embodiments, a target nucleic acid sequence is extracted from cells of two or more different organisms. In certain embodiments, a target nucleic acid sequence concentration in a PCR reaction ranges from about 1 to about 10,000,000 molecules per reaction.

In certain embodiments, each primer is sufficiently long to prime the template- directed synthesis of the target nucleic acid sequence under the conditions of the amplification reaction. In certain embodiments, the lengths of the primers depends on many factors, including, but not limited to, the desired hybridization temperature between the primers, the target nucleic acid sequence and the complexity of the different target nucleic acid sequences to be amplified, and other factors. In certain embodiments, a primer is about 15 to about 35 nucleotides in length. In certain embodiments, a primer is fewer than 15 nucleotides in length. In certain embodiments, a primer is greater than 35 nucleotides in length. In certain embodiments, a set of primers comprises at least one set of primers which comprises at least one designed portion and at least one random portion. In certain embodiments, the designed portion of a primer set is at the 5' end of the primers. In certain embodiments, the designed portion of a primer set is at the 3' end of the primers. In certain embodiments, the designed portion of a primer set is in the center of the primers. In certain embodiments, the designed portion of a primer set includes two or more designed portions. In certain embodiments, the designed portions of a primer set are located in two or more portions separated by random portions.

The amplification methods used in the present invention generally use polymerases. In certain embodiments, a polymerase is active at 37⁰C. In certain embodiments, a polymerase is active at a temperature other than 37⁰C. In certain embodiments, a polymerase is active at a temperature greater than 37⁰C. In certain embodiments, a polymerase is active at both 37⁰C. and other temperatures.

In certain embodiments, a thermostable polymerase remains active at a temperature greater than about 42⁰C. In certain embodiments, a thermostable polymerase remains active at a temperature greater than about 5O ⁰C. In certain embodiments, a thermostable polymerase remains active at a temperature greater than about 60⁰C. In certain embodiments, a thermostable polymerase remains active at a temperature greater than about 7O⁰C. In certain embodiments, a thermostable polymerase remains active at a temperature greater than about 8O ⁰C. In certain embodiments, a thermostable polymerase remains active at a temperature greater than about 90⁰C. Exemplary thermostable polymerases include, but are not limited to, Thermus thermophilics HB8 (described, e.g., in U.S. Pat. No. 5,789,224); mutant Thermus thermophilics HB8, including, but not limited to, Thermus thermophilus HB8 (D 18 A; F669Y; E683R), Thermus thermophilus HB8 (A271 ; F669Y; E683W), and Thermus thermophilus HB8 (D18A; F669Y); Thermus oshimai (described, e.g., in U.S. Provisional Application No. 60/334,798, filed Nov. 30, 2001 , corresponding to U.S. Application No. 20030194726, Thermus oshimai Nucleic Acid Polymerases, published Oct. 16, 2003); mutant Thermus oshimai, including, but not limited to, Thermus oshimai (G43D; F665Y); Thermus scotoductus (described, e.g., in U.S. Provisional Application No. 60/334,489, filed Nov. 30, 2001); mutant Thermus scotoductus, including, but not limited to, Thermus scotoductus (G46D; F668Y); Thermus thermophilus 1B21

(described, e.g., in U.S. Provisional Application No. 60/336,046, filed Nov. 30, 2001), mutant Thermus thermophilus 1B21, including, but not limited to, Thermus thermophilus 1B21 (G46D; F669Y); Thermus thermophilus GK24 (described, e.g., in U.S. Provisional Application No. 60/336,046, filed Nov. 30, 2001); mutant Thermus thermophilus GK24, including, but not limited to, Thermus thermophilus GK24 (G46D; F669Y); Thermus aquaticus polymerase; mutant Thermus aquaticus polymerase, including, but not limited to, Thermus aquaticus (G46D; F667Y) (AmpliTaq.RTM. FS or Taq (G46D; F667Y), described, e.g., in U.S. Pat. No. 5,614,365), Taq (G46D; F667Y; E681 1), and Taq (G46D; F667Y; T664N; R660G); Pyrococcus furiosus polymerase; mutant Pyrococcus furiosus polymerase; Thermococcus gorgonarius polymerase; mutant Thermococcus gorgonarius polymerase; Pyrococcus species GB-D polymerase; mutant Pyrococcus species GB-D polymerase; Thermococcus sp. (strain 9.degree.N-7) polymerase; mutant Thermococcus sp. polymerase; Bacillus stear other mophilus polymerase; mutant Bacillus stearothermophilus polymerase; Tsp polymerase; mutant Tsp polymerase; ThermalAce™ polymerase (Invitrogen); Thermus flavus polymerase; mutant Thermus flavus polymerase; Thermus litoralis polymerase; mutant Thermus litoralis polymerase. In certain embodiments, a thermostable polymerase is a mutant of a naturally-occurring polymerase.

Exemplary non-thermostable polymerases include, but are not limited to DNA polymerase I; mutant DNA polymerase I, including, but not limited to, Klenow fragment and Klenow fragment (3'>5' exonuclease minus); T4 DNA polymerase; mutant T4 DNA polymerase; T7 DNA polymerase; mutant T7 DNA polymerase; phi29 DNA polymerase; and mutant phi29 DNA polymerase.

In certain embodiments, a polymerase is a processive polymerase. In certain embodiments, a processive polymerase remains associated with the template for two or more nucleotide additions. In certain embodiments, a non-processive polymerase disassociates from the template after the addition of each nucleotide. In certain embodiments, a processive DNA polymerase has a characteristic polymerization rate. In certain embodiments, a processive DNA polymerase has a polymerization rate of between 5 to 300 nucleotides per second. In certain embodiments, a processive DNA polymerase has a higher processivity in the presence of accessory factors, such as one or more additives. In certain embodiments, the processivity of a processive DNA polymerase may be influenced by the presence or absence of accessory single-stranded DNA-binding proteins and helicases. In certain embodiments, the net polymerization rate will depend on the enzyme concentration, because at higher concentrations there are more re-initiation events and thus the net polymerization rate is increased. In certain embodiments, the processive polymerase is Bst polymerase.

"Strand displacement" as used herein refers to the phenomenon in which a chemical, physical, or biological agent causes at least partial dissociation of a nucleic acid that is hybridized to its complementary strand. In certain embodiments, a DNA polymerase is a strand displacement polymerase. In certain embodiments, a processive DNA polymerase is also a strand displacement polymerase, which is capable of displacing a hybridized strand encountered during replication. In certain embodiments, a strand displacement polymerase requires a factor that facilitates strand displacement to be capable of displacing a hybridized strand encountered during replication. In certain embodiments, a strand displacement polymerase is capable of displacing a hybridized strand encountered during replication in the absence of a strand displacement factor. In certain embodiments, the strand displacement polymerase lacks 5' to 3' exonuclease activity.

In certain embodiments, the dissociation of a nucleic acid that is hybridized to its complementary strand occurs in a 5' to 3' direction in conjunction with replication. In certain embodiments, where a primer extension reaction forms a newly synthesized strand while displacing a second nucleic acid strand from the template nucleic acid strand, both the newly synthesized and displaced second nucleic acid strand have the same base sequence, which is complementary to the template nucleic acid strand. In certain embodiments, a molecule comprises both strand displacement activity and another activity. In certain embodiments, a molecule comprises both strand displacement activity and polymerase activity. In certain embodiments, strand displacement activity is the only activity associated with a molecule. Enzymes that possess both strand displacement activity and polymerase activity include, but are not limited to, E. coli DNA polymerase 1 , the Klenow fragment of DNA polymerase 1 , the bacteriophage T7 DNA polymerase, the bacteriophage T5 DNA polymerase, the (p29 polymerase, and the Bst polymerase. Certain methods of using enzymes possessing strand displacement activity include, but are not limited to, those described by, e.g., Kornberg, A., DNA Replication, W.H. Freeman & Co., San Francisco, Calif., 1980.

The term "strand displacement replication" refers to nucleic acid replication which involves strand displacement. In certain embodiments, strand displacement is facilitated through the use of a strand displacement factor, such as a helicase. In certain embodiments, a DNA polymerase that can perform a strand displacement replication in the presence of a strand displacement factor is used in strand displacement replication. In certain embodiments, the DNA polymerase does not perform a strand displacement replication in the absence of such a factor. Strand displacement factors useful in strand displacement replication include, but are not limited to, BMRFl polymerase accessory subunit (Tsurumi et al., J. Virology 67(12):7648-7653 (1993) ), herein incorporated by reference), adenovirus DNA-binding protein (Zijderveld and van der Vliet, J. Virology 68(2): 1 158-1 164 (1994) ), herein incorporated by reference), herpes simplex viral protein ICP8 (Boehmer and Lehnan, J. Virology 67(2):71 1-715 (1993); Skaliter and Lehman, Proc. Natl. Acad. Sci. USA 91(22): 10665- 10669 (1994), all of which are herein incorporated by reference); single-stranded DNA binding proteins (SSB; Rigler and Romano, J. Biol. Chem. 270:8910-8919 (1995), herein incorporated by reference); phage T4 gene 32 protein (Villemain and Giedroc, Biochemistry 35:14395-14404 (1996); and calf thymus helicase (Siegel et al., J. Biol. Chem. 267: 13629-13635 (1992), all of which are herein incorporated by reference). Strand displacement amplification (SDA) reaction methods include, but are not limited to, those described in, e.g., Fraiser et al., U.S. Pat. No. 5,648,21 1 ; Cleuziat et al., U.S. Pat. No. 5,824,517; and Walker et al., Proc. Natl. Acad. Sci. U.S.A. 89:392-396 (1992), all of which are herein incorporated by reference). In certain embodiments, the ability of a polymerase to carry out strand displacement replication can be determined by using the polymerase in a strand displacement replication assay such as those described, e.g., in U.S. Pat. No. 6,642,034, herein incorporated by reference or in a primer-block assay described, e.g., in Kong et al., J. Biol. Chem. 268: 1965-1975 (1993), herein incorporated by reference.

In certain embodiments, an amplification reaction comprises a blend of polymerases. In certain such embodiments, at least one polymerase possesses exonuclease activity. In certain embodiments, none of the polymerases in an amplification reaction possess exonuclease activity. Exemplary polymerases that may be used in an amplification reaction include, but are not limited to, phi.29 DNA polymerase, taq polymerase, stoffel fragment, Bst DNA polymerase. E. coli DNA polymerase 1 , the Klenow fragment of DNA polymerase 1 , the bacteriophage T7 DNA polymerase, the bacteriophage T5 DNA polymerase, and other polymerases known in the art. In certain embodiments, a polymerase is inactive in the reaction composition and is subsequently activated at a given temperature.

In certain embodiments, an amplification reaction composition is formed comprising (a) two or more target nucleic acid sequences, (b) at least one set of primers, and (c) at least one polymerase. In certain embodiments, an amplification reaction composition is formed comprising two or more target nucleic acid sequences, at least one primer set, dNTPs, at least one buffering agent and at least one polymerase. In certain such embodiments, the amplification reaction is incubated under conditions that allow the formation of one or more amplification products. In certain embodiments, the amplification reaction further includes one or more additives. In certain embodiments, no strand displacement factors are required for strand displacement.

In certain embodiments, an amplification reaction composition comprises strand displacement factors. Exemplary strand displacement factors include, but are not limited to, helicases and single stranded DNA binding protein. In certain embodiments, the temperature of the reaction affects strand displacement. In certain embodiments, a temperature of approximately 40⁰C, 45⁰C, 50⁰C, 55⁰C, 60⁰C, 65 ⁰C, 70⁰C, 75⁰C, 80⁰C, 85 ⁰C, or 9O ⁰C. facilitates strand displacement by allowing segments of double stranded DNA to separate and reanneal.

In certain embodiments, the temperature of the amplification reaction is kept at isothermal reaction conditions. The term "isothermal reaction conditions" refers to conditions wherein the temperature is kept substantially constant. In certain embodiments, isothermal reaction conditions prevent the template DNA from being completely disassociated. In certain embodiments, short primers can hybridize to a double stranded template maintained at an isothermal temperature. In certain such embodiments, the primers that strand invade and anneal to the template DNA can be extended by a strand-displacing DNA polymerase. In certain embodiments, an amplification process is isothermal at 50⁰C and uses Bst DNA polymerase for strand displacement and extension. In certain embodiments, an amplification reaction uses a fragment of Bst DNA polymerase with the 3'>5' exonuclease activity removed ("the large fragment of Bst DNA polymerase"). Certain amplification methods include, but are not limited to, Random PCR or

Primer Extension Preamplification-PCR (PEP-PCR) (Zhang et al., Proc. Natl. Acad. Sci., USA 89: 5847-51 (1992), herein incorporated by reference), Linker Adapter PCR (Miyashita et al., Cytogenet. Cell Genet. 66(1): 54-57 (1994), herein incorporated by reference), Tagged-PCR (Grothues et. al., Nuc. Acids Res. 21(5) 1321-1322 (1993), herein incorporated by reference), Inter-Alu-PCR (Bicknell et. al., Genomics 10: 186-192 (1991), herein incorporated by reference), Degenerate Oligonucleotide Primed-PCR (DOP-PCR) (Cheung et al., Proc. Natl. Acad. Sci., USA 93: 14676-14679 (1996), herein incorporated by reference), Improved-Primer Extension Preamplification PCR (I-PEP- PCR) (Dietmaier et al., Amer. J. Pathology 154(1): 83-95 (1999), herein incorporated by reference and U.S. Pat. No. 6,365,375, herein incorporated by reference), LL-DOP PCR (Kittler et al., Anal. Biochem. 300:237-244 (2002) , herein incorporated by reference), Balanced PCR amplification (Makrigiorgos et. al., Nature Biotech. 20:936-939 (2002) , herein incorporated by reference), Multiple Displacement Amplification (MDA) (U.S. Pat. Nos. 6,124,120 and 6,280,949, all of which are herein incorporated by reference), and Random Primer Amplification (RPA) (U.S. Pat. No. 5,043,272, herein incorporated by reference). In certain embodiments, multiplex amplification may be used (see, e.g., Published U.S. Patent Application No. 2004-0175733 Al , herein incorporated by reference).

In certain embodiments, multiplex amplification is used to distinguish between target nucleic acid sequences that have single nucleotide polymorphisms ("SNP"). In certain such embodiments, one or more multiplex amplification reactions include one or more primer sets specific for two or more target nucleic acid sequences that differ at only a single nucleotide and are present in similar abundance. In certain such embodiments, the one or more multiplex amplification reactions further include one or more probes with different detectable labels specific for the presence or absence of that particular single nucleotide. In certain such embodiments, the signal from the label in the multiplex amplification reaction is detected as an indicator of the presence of one or more SNPs.

In certain embodiments, multiplex amplification is used for melting curve analysis. In certain such embodiments, a multiplex amplification reaction includes two or more primer sets specific for two or more target nucleic acid sequences of similar abundance and also includes one or more probes that intercalate into double-stranded target nucleic acid sequences and does not bind to single-stranded target nucleic acid sequences. In certain such embodiments, a multiplex amplification reaction includes two or more primer sets specific for two or more target nucleic acid sequences and also includes one or more probes that bind to single-stranded target nucleic acid sequences but does not bind to double-stranded target nucleic acid sequences. In certain such embodiments, the one or more probes includes a detectable label, and the label is detectable only when the one or more probes interact with their target nucleic acid sequences. In certain such embodiments, the temperature of the reaction is modified gradually and the signal from the detectable label is monitored such that the shift of the one or more target nucleic acid sequences from single-stranded to double-stranded or from double-stranded to single-stranded as a function of temperature is recorded. In certain such embodiments, the signal from the detectable label is monitored using realtime PCR. In certain embodiments, a multiplex amplification reaction includes two or more primer sets specific for two or more target nucleic acid sequences and also includes two or more probes that bind to single-stranded target nucleic acid sequences but do not bind to double-stranded target nucleic acid sequences. In certain embodiments, the two or more probes include a detectable label, and the label is detectable only when the two or more probes interact with their target nucleic acid sequence(s). In certain embodiments, the temperature of the reaction is modified gradually and the signal from the detectable label is monitored such that the shift of one or more target nucleic acid sequences from single-stranded to double stranded or from double-stranded to single-stranded as a function of temperature is recorded. In certain such embodiments, the signal from the detectable label is monitored using real-time PCR. In certain embodiments, one or more target nucleic acid sequences undergo a treatment before being included in an amplification reaction. In certain embodiments, a target nucleic acid treatment selectively modifies a target nucleic acid according to the methylation state of the target nucleic acid sequence (see, e.g., Published U.S. Patent Application No. 2004-0101843, U.S. Pat. No. 6,265,171 ; and U.S. Pat. No. 6,331,393, all of which are herein incorporated by reference). In certain embodiments, the sample from which one or more target nucleic acid sequences is derived (e.g., a cell, tissue, etc.) undergoes a treatment prior to the inclusion of the target nucleic acid sequences from that sample in a multiplex amplification reaction. In certain embodiments, the amplification of one or more target nucleic acid sequences from a treated sample is compared to the amplification of one or more target nucleic acid sequences from an untreated control sample. In certain such embodiments, the expression of one or more genes in response to the treatment is determined.

In certain embodiments, the products of two or more amplification reactions are combined. In certain such embodiments, the products of one amplification reaction may have a different amplification profile than the products of the second amplification reaction.

In preferred embodiments, multiplex PCR reactions are performed with nucleic acid samples from a community of microorganisms. In particularly preferred embodiments, primers are designed to amplify target regions identified according to the probe design and selection methods described above. The amplified target regions are designated amplicons. In some embodiments, multiple target regions from a gene of interest are amplified to produce multiple amplicons. In further embodiments, multiple target regions from multiple target genes of interest are amplified. In some embodiments, it is necessary to conduct more than one multiplex PCR reaction for each nucleic acid sample so that a large number of target regions can be amplified. For example, five multiplex PCR reactions can be performed using ten different primer sets to amplify fifty different amplicons for assay on a microarray. In some particularly preferred embodiments, the primers have a length of from 18-30 bases, a GC content of from 40- 60%, and annealing temperature of from about 52 to about 58 degrees Celsius. In some embodiments, the primers are designed for detection of multiple waterborne pathogens. In some particularly preferred embodiments, the primers are designed to amplify portions of virulence-associated genes. In some embodiments, about 5, 10, 12, 15 or 20 waterborne pathogens are analyzed. In some embodiments, about 10, 20, 30, 40, 50, 60 or more genes are analyzed, for example, about 10, 20, 30, 40, 50, 60 virulence- associated genes.

As described above, in preferred embodiments, the assay design is redundant. For each organism assayed, probes are used for multiple genes from each target organism, multiple amplicons from each gene are analyzed, and multiple discreet probes are used for each amplicon. Accordingly, in some embodiments, about 5, 10, 15, 20, 30 40, or 50 to about 100 genes are analyzed in an assay of the present invention. In further embodiments, about 1, 3, 5, 10, 15 or 20 to about 50 amplicons per gene are amplified and analyzed in an assay of the present invention. In further embodiments, about 1, 3, 5, 10, 15 or 20 to about 50 probes for each amplicon are used in assay of the present invention. In some embodiments, the probes are attached to a microarray and amplicons are hybridized to the microarray, as described in more detail below.

In certain embodiments, a label is attached to one or more amplicons during either a first amplification with a thermostable polymerase or a second amplification with a non-thermostable polymerase. In some embodiments, the labels are included in the amplification reaction. In preferred embodiments, the labels have one or more of the following properties: (i) provides a detectable signal; (ii) interacts with a second label to modify the detectable signal provided by the second label, e.g., FRET (Fluorescent Resonance Energy Transfer); (iii) stabilizes hybridization, e.g., duplex formation; and (iv) provides a member of a binding complex or affinity set, e.g., affinity, antibody/antigen, ionic complexes, hapten/ligand, e.g., biotin/avidin. In certain embodiments, use of labels can be accomplished using any one of a large number of known techniques employing known labels, linkages, linking groups, reagents, reaction conditions, and analysis and purification methods.

Labels include, but are not limited to, light-emitting, light-scattering, and light- absorbing compounds which generate or quench a detectable fluorescent, chemiluminescent, or bioluminescent signal (see, e.g., Kricka, L. in Nonisotopic DNA Probe Techniques (1992), Academic Press, San Diego, pp. 3-28, and Non-Radioactive Labelling, A Practical Introduction, Garman, A. J. (1997) Academic Press, San Diego). Fluorescent reporter dyes useful as labels include, but are not limited to, fluoresceins (see, e.g., U.S. Pat. Nos. 5,188,934; 6,008,379; and 6,020,481), rhodamines (see, e.g., U.S. Pat. Nos. 5,366,860; 5,847,162; 5,936,087; 6,051,719; and 6,191,278), benzophenoxazines (see, e.g., U.S. Pat. No. 6,140,500), energy-transfer fluorescent dyes, comprising pairs of donors and acceptors (see, e.g., U.S. Pat. Nos. 5,863,727; 5,800,996; and 5,945,526), and cyanines (see, e.g., Kubista, WO 97/45539), as well as any other fluorescent moiety capable of generating a detectable signal. Examples of fluorescein dyes include, but are not limited to, 6-carboxyfluorescein; 2',4',1,4,- tetrachlorofluorescein; and 2',4^l,5',7',l ,4-hexachlorofluorescein. In certain embodiments, the fluorescent label is selected from SYBR®-green, 6-carboxyfluorescein ("FAM"), TET, ROX, VIC™, and JOE. In certain embodiments, a label is a radiolabel.

In certain embodiments, labels are hybridization-stabilizing moieties which serve to enhance, stabilize, or influence hybridization of duplexes, e.g. intercalators and intercalating dyes (including, but not limited to, ethidium bromide and S YBR.RTM. green), minor-groove binders, and cross-linking functional groups (see, e.g., Blackburn, G. and Gait, M. Eds. "DNA and RNA structure" in Nucleic Acids in Chemistry and Biology, 2.sup.nd Edition, (1996) Oxford University Press, pp. 15-81 , herein incorporated by reference). In certain embodiments, labels effect the separation or immobilization of a molecule by specific or non-specific capture, for example biotin, digoxigenin, and other haptens (see, e.g., Andrus, A. "Chemical methods for 5' non- isotopic labeling of PCR probes and primers" (1995) in PCR 2: A Practical Approach, Oxford University Press, Oxford, pp. 39-54, herein incorporated by reference).

In certain embodiments, different amplicons comprise detectable and different labels that are distinguishable from one another. For example, in certain embodiments, labels are different fluorophores capable of emitting light at different, spectrally- resolvable wavelengths (e.g., 4-differently colored fluorophores); certain such labeled probes are known in the art and described above, and in, e.g., U.S. Pat. No. 6,140,054, herein incorporated by reference and Saiki et al., 1986, Nature 324: 163-166, herein incorporated by reference. Preferably, the detectable label is a fluorescent label, e.g., by incorporation of nucleotide analogs. Other labels suitable for use in the present invention include, but are not limited to, biotiή, imminobiotin, antigens, cofactors, dinitrophenol, lipoic acid, olefϊnic compounds, detectable polypeptides, electron rich molecules, enzymes capable of generating a detectable signal by action upon a substrate, and radioactive isotopes. Preferred radioactive isotopes include ³²P, ³⁵S, ¹⁴C, ¹⁵N and ¹²⁵I. Fluorescent molecules suitable for the present invention include, but are not limited to, fluorescein and its derivatives, rhodamine and its derivatives, texas red, 5'carboxy- fluorescein (FMA), 2',7'-dimethoxy-4',5^t-dichloro-6-carboxy-fluorescein (JOE), N,N,N',N'-tetramethyl-6-carboxy-rhodamine (TAMRA), 6'carboxy-X-rhodamine (ROX), HEX, TET, IRD40, and IRD41. Fluorescent molecules that are suitable for the invention further include: cyamine dyes, including by not limited to Cy3, Cy3.5 and Cy5; BODIPY dyes including but not limited to BODIPY-FL, BODIPY-TR, BODIPY-TMR, BODIPY- 630/650, and BODIPY-650/670; and ALEXA dyes, including but not limited to ALEXA- 488, ALEXA-532, ALEXA-546, ALEXA-568, and ALEXA-594; as well as other fluorescent dyes which will be known to those who are skilled in the art. Electron rich indicator molecules suitable for the present invention include, but are not limited to, ferritin, hemocyanin, and colloidal gold. Alternatively, in less preferred embodiments the target polynucleotides may be labeled by specifically complexing a first group to the polynucleotide. A second group, covalently linked to an indicator molecules and which has an affinity for the first group, can be used to indirectly detect the target polynucleotide. In such an embodiment, compounds suitable for use as a first group include, but are not limited to, biotin and imminobiotin. Compounds suitable for use as a second group include, but are not limited to, avidin and streptavidin.

C. Microarrays In preferred embodiments, probes designed by the methods described above are used on microarrays. A microarray is an array of positionally-addressable binding (e.g., hybridization) sites on a support. Each of such binding sites comprises a plurality of polynucleotide molecules of a probe bound to the predetermined region on the support. Microarrays can be made in a number of ways, of which several are described herein below (see e.g., Meltzer, 2001, Curr. Opin. Genet. Dev. 1 1(3):258-63; Andrews et al., 2000, Genome Res. 10(12):2030-43; Abdellatif, 2000, Circ. Res. 86(9):919-20; Lennon, 2000, Drug Discov. Today 5(2):59-66; Zweiger, 1999, Trends Biotechnol. 17(1 1):429- 36, all of which are herein incoφorated by reference). However produced, microarrays share certain characteristics. The arrays are preferably reproducible, allowing multiple copies of a given array to be produced and easily compared with each other. Preferably, the microarrays are made from materials that are stable under binding (e.g., nucleic acid hybridization) conditions. However, both larger and smaller (e.g., 0.5 cm.sup.2 or less) arrays are also contemplated and may be preferable, e.g., for simultaneously evaluating a very large number of different probes. Any of these methods described herein may be used for making microarrays comprising the probes designed and identified as described above.

In preferred embodiments, methods of the invention utilize polynucleotide probes synthesized directly on the support to form the array. The probes are attached to a solid support or surface, which may be made, e.g., from glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, gel, or other porous or nonporous material. There are a variety of techniques known for producing arrays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface. For example, photolithographic techniques for synthesis in situ (see, Fodor et al., 1991 , Science 251 :767-773; Pease et al., 1994, Proc. Natl. Acad. Sci. U.S.A. 91 :5022-5026; Lockhart et al., 1996, Nature BioTechnology 14:1675; U.S. Pat. Nos. 5,489,678; 5,578,832; 5,556,752; 5,510,270; 6,197,506; 5,143,854, 5,424,186 and 6,346,413, all of which are herein incorporated by reference) or other methods for rapid synthesis and deposition of defined oligonucleotides (Blanchard et al., Biosensors & Bioelectronics 1 1 :687-690) may be used. In some particularly preferred embodiments, microarrays are synthesized by the methods described by Gao, X. L., et al., A flexible light-directed DNA chip synthesis gated by deprotection using solution photogenerated acids. Nucleic Acids Research, 2001. 29(22): p. 4744-4750, herein incorporated by reference. See also, U.S. Pat. Publ. 20020081582 and 20030138363 and U.S. Pat. No. 6,375,903, all of which are incorporated herein by reference.

Other methods for making microarrays, e.g., by masking (Maskos and Southern, 1992, Nucl. Acids. Res. 20:1679-1684, herein incorporated by reference), may also be used. In principle, and as noted supra, any type of array, for example, dot blots on a nylon hybridization membrane (see Sambrook et al., supra) could be used. However, as will be recognized by those skilled in the art, very small arrays will frequently be preferred because hybridization volumes will be smaller.

In some embodiments, microarrays of the invention are manufactured by means of an ink jet printing device for oligonucleotide synthesis, e.g., using the methods and systems described by Blanchard in International Patent Publication No. WO 98/41531, published Sep. 24, 1998; Blanchard et al., 1996, Biosensors and Bioelectronics 1 1 :687- 690; Blanchard, 1998, in Synthetic DNA Arrays in Genetic Engineering, Vol. 20, J. K. Setlow, Ed, Plenum Press, New York at pages 1 1 1-123; Hughes et al., 2001, Nature BioTechnology 19:342-347; and U.S. Pat. No. 6,028,189 to Blanchard, all of which are herein incorporated by reference. Specifically, the oligonucleotide probes in such microarrays are preferably synthesized in arrays, e.g., on a glass slide, by serially depositing individual nucleotide bases in microdroplets of a high surface tension solvent such as propylene carbonate. The microdroplets have small volumes (e.g., 100 pL or less, more preferably 50 pL or less) and are separated from each other on the microarray (e.g., by hydrophobic domains) to form circular surface tension wells which define the locations of the array elements (i.e., the different probes). Polynucleotide probes are attached to the surface covalently at the 3' end of the polynucleotide.

D. Assays Using Microarrays

In a preferred embodiment in the present invention, sample processing is through hybridization of the amplicons obtained through multiplex PCR on a nucleotide microarray. In a more preferred embodiment, the microarray is an oligonucleotide array. Preferably, the microarray contains in the range of 20 to 50,000 nucleic acid probes. The probes are identified by the methods described in detail above. The probes can be arranged in a variety of patterns. For example, the probes can be arranged in rows and columns, polygonal (e.g., hexagonal), or circular patterns, etc. In preferred embodiments, the microarrays comprise a redundant set of probes. For each organism assayed, the microarray comprises probes for multiple genes from each target organism. In some embodiments, the microarray further comprises probes to multiple amplicons from each the multiple genes that are analyzed. In further embodiments, the microarray comprises multiple discreet probes for each amplicon.

Preferably, the density of probes on a microarray is about 100 different (i.e., non- identical) probes per 1 cm² or higher. More preferably, a microarray used in the methods of the invention will have at least 550 probes per 1 cm², at least 1000 probes per 1 cm², at least 1500 probes per 1 cm² or at least 2000 probes per 1 cm². In a particularly preferred embodiment, the microarray is a high density array, preferably having a density of at least about 2500 different probes per 1 cm². The microarrays used in the invention therefore preferably contain at least 2500, at least 5000, at least 10000, at least 15000, at least 20000, at least 25000, at least 50000 or at least 55000 different (i.e., non-identical) probes.

Such polynucleotide probes are preferably of the length of 15 to 200 bases, more preferably of the length of 20 to 100 bases, most preferably 40-60 bases. It will be understood that each probe sequence may also comprise a linker (e.g., spacer) in addition to the sequence that is complementary to its target sequence. As used herein, a linker refers to a chemical structure between the sequence that is complementary to its target sequence and the surface. The linker need not be a nucleotide sequence. For example, the linker can be composed of a nucleotide sequence, or peptide nucleic acids, hydrocarbon chains, etc. It will be appreciated that when a sample of target nucleic acid molecules, e.g., cDNA complementary to the RNA of a cell is made and hybridized to a microarray under suitable hybridization conditions, the level of hybridization to the site in the array will reflect the prevalence of the corresponding complementary sequences in the sample. For example, when detectably labeled (e.g., with a fluorophore) amplicon is hybridized to a microarray, the site on the array corresponding to a nucleotide sequence that is not in the sample will have little or no signal (e.g., fluorescent signal), and a nucleotide sequence that is prevalent in the sample will have a relatively strong signal. The relative abundance of different nucleotide sequences in a sample may be determined by the signal strength pattern of probes on a microarray. The target polynucleotides may be from any source. For example, the target polynucleotide molecules may be naturally occurring nucleic acid molecules such as genomic or extragenomic DNA molecules isolated from an organism, or RNA molecules, such as mRNA molecules, isolated from an organism. Alternatively, the polynucleotide molecules may be synthesized, including, e.g., nucleic acid molecules synthesized enzymatically in vivo or in vitro, such as cDNA molecules, or polynucleotide molecules synthesized by PCR, RNA molecules synthesized by in vitro transcription, etc. The sample of target polynucleotides can comprise, e.g., molecules of DNA, RNA, or copolymers of DNA and RNA. In preferred embodiments, the target polynucleotides of the invention will correspond to particular genes or to particular gene transcripts (e.g., to particular mRNA sequences expressed in cells or to particular cDNA sequences derived from such mRNA sequences). However, in many embodiments, particularly those embodiments wherein the polynucleotide molecules are derived from mammalian cells, the target polynucleotides may correspond to particular fragments of a gene transcript. For example, the target polynucleotides may correspond to different exons of the same gene, e.g., so that different splice variants of that gene may be detected and/or analyzed. In preferred embodiments, the target polynucleotides to be analyzed are prepared in vitro from nucleic acids extracted from cells. For example, in one embodiment, RNA is extracted from cells (e.g., total cellular RNA, poly(A)+ messenger RNA, fraction thereof) and messenger RNA is purified from the total extracted RNA Methods for preparing total and poly(A)+ RNA are well known in the art, and are described generally, e.g. in Sambrook et al., supra. In one embodiment, RNA is extracted from cells of the various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation and an oligo dT purification (Chirgwin et al., 1979, Biochemistry 18:5294-5299). In another embodiment, total RNA is extracted from cells using guanidinium thiocyanate lysis followed by purification on RNeasy columns (Qiagen). cDNA is then synthesized from the purified mRNA using, e.g., oligo-dT or random primers. In preferred embodiments, the target polynucleotides are cRNA prepared from cDNA prepared from purified mRNA or from total RNA extracted from cells. As used herein, cRNA can either be complementary to (anti-sense) or of the same sequence (sense) as the sample RNA. The extracted RNA molecules are amplified using a process in which double-stranded cDNA molecules are synthesized from the sample RNA molecules using primers linked to an RNA polymerase promoter. As a result, RNA polymerase promoters can be incorporated into either or both strands of the cDNA. Using the RNA polymerase promoter that is on the first strand of the cDNA molecule, cRNA can be synthesized that is the same sequence as the sample RNA. To synthesize cRNA complementary to the sample RNA, transcription can be initiated from the RNA polymerase promoter that is on the second strand of the double-stranded cDNA molecule using an RNA polymerase (see, e.g., U.S. Pat. Nos. 5,891,636, 5,716,785; 5,545,522 and 6,132,997; see also, U.S. Pat. No. 6,271 ,002 and U.S. Provisional Patent Application Ser. No. 60/253,641, filed on Nov. 28, 2000, by Ziman et al., all of which are herein incorporated by reference). Both oligo-dT primers (U.S. Pat. Nos. 5,545,522 and 6,132,997, herein incorporated by reference) or random primers (U.S. Provisional Patent Application Ser. No. 60/253,641, filed on Nov. 28, 2000, by Ziman et al., herein incorporated by reference) that contain an RNA polymerase promoter or complement thereof can be used. Preferably, the target polynucleotides are short and/or fragmented polynucleotide molecules which are representative of the original nucleic acid population of the cell. In one embodiment, total RNA is used as input for cRNA synthesis. An oligo- dT primer containing a T7 RNA polymerase promoter sequence can be used to prime first strand cDNA synthesis. When second strand synthesis is desired, random hexamers can be used to prime second strand cDNA synthesis by a reverse transcriptase. This reaction yields a double-stranded cDNA that contains the T7 RNA polymerase promoter at the 3' end. The double-stranded cDNA can then be transcribed into cRNA by T7 RNA polymerase. In preferred embodiments, the target nucleic acids are amplified as described in detail above.

The target polynucleotides to be analyzed are preferably detectably labeled. For example, cDNA can be labeled directly, e.g., with nucleotide analogs, or indirectly, e.g., by making a second, labeled cDNA strand using the first strand as a template.

Alternatively, the double-stranded cDNA can be transcribed into cRNA and labeled. In some preferred embodiments, amplicons are labeled by including suitable labeled substrates during amplification with either thermostable or non-thermostable polymerases. In preferred embodiments, nucleic acid hybridization and wash conditions are chosen so that the polynucleotide molecules to be analyzed (or target polynucleotide molecules) specifically bind or specifically hybridize to the complementary polynucleotide sequences of the array, preferably to one or more specific array sites, wherein its complementary sequence is located.

Arrays containing double-stranded probe DNA situated thereon are preferably subjected to denaturing conditions to render the DNA single-stranded prior to contacting with the target polynucleotide molecules. Arrays containing single-stranded probe DNA (e.g., synthetic oligodeoxyribonucleic acids) may need to be denatured prior to contacting with the target polynucleotide molecules, e.g., to remove hairpins or dimers which form due to self complementary sequences. Optimal hybridization conditions will depend on the length (e.g., oligomer versus polynucleotide greater than 200 bases) and type (e.g., RNA, or DNA) of probe and target nucleic acids. General parameters for specific (i.e., stringent) hybridization conditions for nucleic acids are described in Sambrook et al., (supra), and in Ausubel et al., 1987, Current Protocols in Molecular Biology, Greene Publishing and Wiley-Interscience, New York, herein incorporated by reference. For example, when cDNA microarrays are used, typical hybridization conditions are hybridization in 5X SSC plus 0.2% SDS at 65⁰C. for four hours, followed by washes at 25⁰C. in low stringency wash buffer (IX SSC plus 0.2% SDS), followed by 10 minutes at 25⁰C. in higher stringency wash buffer (0.1 X SSC plus 0.2% SDS) (Hughes et al., 2001 , Nature BioTechnology 19:342-347). Useful hybridization conditions are also provided in, e.g., Tijessen, 1993, Hybridization With Nucleic Acid Probes, Elsevier Science Publishers B. V. and Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press, San Diego, Calif, herein incorporated by reference.

In some preferred embodiments, when fluorescently labeled probes are used, the fluorescence emissions at each site of a transcript array can be, preferably, detected by scanning confocal laser microscopy. In one embodiment, a separate scan, using the appropriate excitation line, is carried out for each of the two fluorophores used. Alternatively, a laser can be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously (see Shalon et al., 1996, Genome Res. 6:639-645, herein incorporated by reference). In a preferred embodiment, the arrays are scanned with a laser fluorescence scanner with a computer controlled X-Y stage and a microscope objective. Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser, and the emitted light is split by wavelength and detected with two photomultiplier tubes. Such fluorescence laser scanning devices are described, e.g., in Schena et al., 1996, Genome Res. 6:639-645. Alternatively, the fiber-optic bundle described by Ferguson et al., 1996, Nature BioTechnology 14:1681-1684, herein incorporated by reference, may be used to monitor mRNA abundance levels at a large number of sites simultaneously.

Signals are recorded and, in a preferred embodiment, analyzed by computer, e.g., using a 12 bit or 16 bit analog to digital board. In one embodiment, the scanned image is despeckled using a graphics program (e.g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet of the average hybridization at each wavelength at each site. If necessary, an experimentally determined correction for cross talk (or overlap) between the channels for the two fluors may be made. For any particular hybridization site on the transcript array, a ratio of the emission of the two fluorophores can be calculated.

In some preferred embodiments, arrays are designed for the high-thoughput analysis of multiple waterborne pathogens. In some embodiments, the assays are configured to detect the presence of from about 5, 10, 15, or 20 to about 30 different waterborne pathogens. In some preferred embodiments, the assays of the present invention are designed to detect the presence of at least one of Aeromonas hydrophila, Campylobacter jejuni, Clostridium perfringens, Helicobacter pylori, Legionella pneumophila, Listeria monocytogenes Staphylococcus aureus, Salmonella, Pseudomonas aeruginosa, Vibrio cholerae, Vibrio parahaemolyticus, and Yersinia enterocolitica and combinations thereof. In some preferred embodiments, target regions of Virulence Maker Genes (VMGs) are amplified and then hybridized to a microarray comprising probes that hybridize to the VMGs. In some preferred embodiments, the microarray comprises about 5, 10, 15, 20, 25, or 30 to about 100 different probes for each VMG for each organism. As described above, in some embodiments, the probes to multiple VGAs from a single organism are included on the microarray and multiple different probes per amplicon from a given target VGA are included on the microarray. In preferred embodiments, the probes are identified as described in detail above. Figure 7 provides a list of exemplary probes identified by the methods described herein and that are suitable for use on microarrays of the present invention. Accordingly, in some embodiments, the microarrays of the present invention comprise any of probes encoded by SEQ ID Nos. 1- 791 , or combinations thereof.

In some embodiments, the arrays are analyzed using the system described co-

pending application bo^OløO-fol_', entitled, Electroluminescent-Based Fluorescence Detection Device, filed November 22, 2006, incorporated herein by reference. In particular, the divice described below can be utilized in combination with the primers, probes and microarrays described herein. For example, the present invention provides fluorescence detection devices comprising an electroluminescent light (EL) source that provide static and/or real-time fluorescent read-outs in a number of formats including visual and digital. In further examples, the present invention provides fluorescence detection devices comprising an electroluminescent light (EL) source that provides PCR assay capabilities, such as thermal cycling and/or isothermal amplification assays, computational capabilities for data read-outs, and read-out capabilities in a number of formats including visual and digital.

It is not intended that the present invention be limited by the nature of the reactions carried out in the electroluminescent fluorescence detection device. Reactions include, but are not limited to, chemical and biological reactions. Biological reactions include, but are not limited to mRNA transcription, nucleic acid amplification, DNA amplification, cDNA amplification, sequencing, and the like. It is also not intended that the invention be limited by the particular purpose for carrying out the biological reactions. In one diagnostic application, it may be desirable to simply detect the presence or absence of a particular pathogen. In another diagnostic application, it may be desirable to simply detect the presence or absence of specific allelic variants of pathogens in a clinical sample. For example, different species or subspecies of bacteria may have different susceptibilities to antibiotics; rapid identification of the specific species or subspecies present aids diagnosis and allows initiation of appropriate treatment. The present invention provides a device, comprising, a) an electroluminescent light source, b) an excitation filter, c) a biological sample holder, and d) an emission filter, wherein said biological sample holder, is disposed between said excitation filter and said emission filter and said electroluminescent light source is adjacent to said excitation filter so that light produced by said electroluminescent light source passes through said excitation filter to illuminate said biological sample holder. The present invention is not limited to a particular electroluminescent light source. Indeed, a variety of electroluminescent light sources may be incorporated, including, but not limited to a blue, blue-green and green electroluminescent film. Indeed, a variety of emission filters and excitation filters may be incorporated, including, but not limited to Super Gel filters, in any case, the emission filter and excitation filter should be optically compatible with the electroluminescent light source and a target fluorescent molecule. The present invention is not limited to a particular biological sample holder. Indeed, a variety of biological sample holders may be used, including, but not limited to a biological sample holder of the present invention. In one embodiment, the biological sample holder is compatible with a PCR chip. In one embodiment, the biological sample holder is compatible with a microarray chip comprising the probes described above. In one embodiment, the biological sample holder is stationary. In one embodiment, the biological sample holder is mobile.

D. Other Uses of Probes and Amplicons.

In various embodiments, amplified target nucleic acid sequences can be used for any purpose for which nucleic acids are used. Certain exemplary uses for amplification products include, but are not limited to, forensic purposes, genotyping, sequencing, detecting SNPs, detecting microsatellite DNA, detecting expression of genes, quantifying expression of genes, nucleic acid library construction, melting curve analysis, and any other purpose that involves manipulating and/or detecting nucleic acids or nucleic acid sequences.

In certain embodiments, amplification products may be used in any process that uses nucleic acids. Exemplary assays in which amplification products may be used include, but are not limited to, agarose gel electrophoresis, picogreen assays, oligonucleotide ligation assays, and assays described in U.S. Pat. Nos. 5,470,705, 5,514,543, 5,580,732, 5,624,800, 5,807,682, 6,759,202, 6,756,204, 6,734,296, 6,395,486, U.S. patent application Ser. Nos. 09/584,905 and 09/724,755, and Published U.S. Patent Application No. US 2003-0190646 Al, all of which are herein incorporated by reference. In certain embodiments, amplification products are treated before they are used in a downstream process. Such treatments include, but are not limited to, heating or enzymatic digestion of amplification products prior to their use in a downstream process.

In certain embodiments, high-throughput assay systems are used. In certain embodiments, a high-throughput assay system includes a plurality of multiplex amplification reactions. In certain such embodiments, the plurality of multiplex amplification reactions is contained on one or more plates or cards, in separate reaction spaces (including, but not limited to, wells or spots). In certain such embodiments, each of the plurality of multiplex amplification reactions amplifies two to five target nucleic acid sequences of similar abundance. In certain such embodiments, each of the plurality of multiplex amplification reactions amplifies more than five target nucleic acid sequences of similar abundance. In certain such embodiments, each of the plurality of multiplex amplification reactions includes a sufficient number of differently-labeled probes such that the amplification product of each target nucleic acid sequence can be separately identified. In certain such embodiments, the amplification reaction proceeds using real-time PCR.

Exemplary high-throughput assay systems include, but are not limited to, an Applied Biosystems plate-reader system (using a plate with any number of wells, including, but not limited to, a 96-well plate, a-384 well plate, a 768-well plate, a 1 ,536- well plate, a 3,456-well plate, a 6,144-well plate, and a plate with 30,000 or more wells), the ABI 7900 Micro Fluidic Card system (using a card with any number of wells, including, but not limited to, a 384-well card), other microfluidic systems that exploit the use of TaqMan probes (including, but not limited to, systems described in WO 04083443 Al , and published U.S. Patent Application Nos. 2003-0138829 Al and 2003-0008308 Al , all of which are herein incorporated by reference), other micro card systems (including, but not limited to, WO04067175 Al , and published U.S. Patent Application Nos. 2004-083443 Al , 2004-01 10275 Al , and 2004-0121364 Al , all of which are herein incorporated by reference), the Invader® system (Third Wave Technologies), the

OpenArray ^M system (Biotrove), systems including integrated fluidic circuits (Fluidigm), and other assay systems known in the art. In certain embodiments, multiple different labels are used in each multiplex amplification reaction in a high-throughput multiplex amplification assay system such that a large number of different target nucleic acid sequences can be analyzed on a single plate or card. In certain embodiments, a high- throughput multiplex amplification assay system is capable of analyzing most of the genes in a genome on a single plate or card. In certain embodiments, a high-throughput multiplex amplification assay system is capable of analyzing all genes in an entire genome on a single plate or card. In certain embodiments, a high-throughput multiplex amplification assay system is capable of analyzing most of the nucleic acids in a transcriptome on a single plate or card. In certain embodiments, a high-throughput multiplex amplification assay system is capable of analyzing all of the nucleic acids in a transcriptome on a single plate or card.

When referring to analyzing most of the genes in a genome by performing one or more amplification reactions, for each gene analyzed, either an entire gene may be amplified or a portion of an entire gene may be amplified. When referring to analyzing all of the genes in a genome by performing one or more amplification reactions, for each gene analyzed, either an entire gene may be amplified or a portion of an entire gene may be amplified. When referring to analyzing most of the nucleic acids in a transcriptome by performing one or more amplification reactions, for each nucleic acid analyzed, either an entire nucleic acid or a portion of an entire nucleic acid may be amplified. When referring to analyzing all of the nucleic acids in a transcriptome by performing one or more amplification reactions, for each nucleic acid analyzed, either an entire nucleic acid or a portion of an entire nucleic acid may be amplified.

EXPERIMENTAL

The following examples serve to illustrate certain embodiments and aspects of the present invention and are not to be construed as liming the scope thereof.

In the experimental disclosures which follow, the following abbreviations apply: N (normal); M (molar); mM (millimolar); μM (micromolar); mol (moles); mmol (millimoles); μmol (micromoles); nmol (nanomoles); pmol (picomoles); g (grams); mg (milligrams); μg (micrograms); ng (nanograms); pg (picograms); L and 1 (liters); ml (milliliters); μl (microliters); cm (centimeters); mm (millimeters); μm (micrometers); nm (nanometers); U (units); min (minute); s and sec (second); deg (degree); ⁰C (degrees Centigrade/Celsius), polymerase chain reaction (PCR).

Example 1

This Example describes the development of PCT primers and oligonucleotide probes used in a microarry assay for waterborne bacterial pathogens.

I. Development of primers and probes and materials and methods for providing one embodiment of a microarray assay for a waterborne bacterial pathogen.

Oligonucleotide probe design and chip synthesis. Two sets of probes were designed and included on a test microarray.

A first set of probes (n = 791, SEQ ID NOs: 1-791) was designed to detect 35 VMGs covering 12 pathogens, listed herein. A total of 47 PCR primer sets were designed for these genes (Table 1), with each amplicon being targeted by 7 to 35 probes, with an average of 17 probes per amplicon.

A second set of probes (n = 2,034) targeted 67 VMGs for the 12 pathogens and VMGs for 5 additional pathogens. For these genes, no primers were designed and the probes were analyzed as not-targeted probes to assess the specificity of the probe design.

VMG primers and probes designed and selected in this study were described to detect the selected pathogens in PCR-based assays. Thus, based on 671 sequences retrieved from Genbank (May-April 2004). Probes were selected to satisfy the following criteria i) the probes were complementary to all retrieved sequences of a given gene with species level specificity, and ii) the probes contained at least two mismatches to all other non-target sequences within the database. At least 2,825 probes were selected satisfying these criteria. The Gibbs free energy of probe-target duplex formation assuming complementary targets (ΔG°d_upi_ex) ranged from -14.1 to -23.9 kcal/mole, with a mean of - 18.4 and a variability (standard deviation) of 2.1 kcal/mole. Further, oligonucleotide 18-mer probes were designed for 93 virulence and marker genes (VMGs) of 17 pathogens (Tables 1 and 2) and even further, 50 mer probes were designed, Figure 12.

Each of the 47 VMG regions obtained in this study was targeted by, on average, 17 probes with a range of 7 to 35 probes per region (Table 1 ).

The designed probes were synthesized in situ on microfluidic chips using a proprietary light-directed synthesis technology developed by the University of Michigan (Gao, X.L., et al., Nucleic Acids Research, 2001. 29(22):p. 4744-4750, herein incorporated by reference) and commercialized by Xeotron (Houston, TX, now part of Invitrogen, Carlsbad, CA).

The microfluidic chip contained 8,000 microreactors with a diameter of 50 μm and were interconnected with flow channels (Fig. 2a). The in situ synthesis technology employed conventional phosphoramidite chemistry with photogenerated acid for deprotection of the 5'-hydroxyl groups of nucleotide monomers for the synthesis of oligonucleotide probes. The probes were attached to the substrate via a spacer consisting of Ts and C|g with an effective length of 12 nucleotides. The microarray also contained at least 22 randomly spaced control spots containing solely linker chemistry to assess background signal intensity

DNA extraction from bacterial strains and environmental samples. Genomic DNA from at least 12 reference bacteria was used to validate the VMG biochip. Cultures of A. hydrophila (ATCC7966), Clostridium perfringens (ATTC 12916), Salmonella (ATCC1331 1), !. monocytogenes (ATCC 15313), P. aeruginosa (ATCC 10145), V. parahaemolyticus (ATCC43996), and Y. enterocolitica (ATCC55075) were obtained from the American Type Culture Collection (ATCC, Manassas, VA). Cells were grown overnight according to the provided instructions. Cells were grown overnight according to the instructions provided by ATCC for the respective organisms. Genomic DNA was , extracted from 1 to 2 ml of the cultures using the DNeasy Tissue Kit (Qiagen, Valencia, CA) according to the manufacturer's instructions. For V, cholerae (ATCC39315), C. jejuni (ATCC700819), H. pylori (ATCC700392), S. aureus (ATCC700699), and L. pneumophila (ATCC33152), purified genomic DNA was obtained from ATCC.

Tap water, river water, and tertiary effluent from a wastewater treatment plant were collected as described herein. The samples were filtered through 0.45 μm nitrocellulose filters (Millipore, Billerica, MA) upon arrival in the lab. DNA was extracted from the microorganisms accumulated on the filters by cutting the filters into at least 8 pieces and processing the filter pieces using the MegaPrep UltraClean™ Soil DNA Kit (Mo Bio Laboratories, Carlsbad, CA) according to the provided instructions. Extracts were further purified through ethanol precipitation, followed by final purification with the QIAquick PCR purification kit (Qiagen). DNA quantification was performed with NanoDrop® ND- 1000 spectrophotometer using a wavelength of 260 nm (NanoDrop Technologies, Wilmington, DE). Polymerase chain reaction amplification of target genes. Forty seven primer sets were designed to flank regions of high probe density in 35 VMGs (Table 1). Primers were selected to have similar annealing temperature (two sets; one with 53 ⁰C and another with 58 ⁰C) and for covering the majority of alleles of a given VMG available in the GenBank database. Primers were synthesized and obtained from Integrated DNA Technologies (Corallville, IA).

Uniqueness of the primers was confirmed by Blast search against the GenBank database. To ensure primer specificity, mismatches to related non-target sequences were located near the 3' end of the primer. For multiplex PCR, primers were segregated into five primer sets with each set containing nine to ten primer pairs. Combinations of primer sets were selected so that each pathogen was targeted in at least two different multiplex PCR reactions (except for P. aeruginosa).

PCR reaction mixtures (25 μl) consisted of Ix PCR buffer, 2 mM MgCl₂, 1.5 (monoplex PCR) or 3 units (multiplex PCR) of AmpliTaq Gold (Roche Molecular Systems, Switzerland), 200 μM deoxynucleoside triphosphates (Invitrogen), 500 nM each primer, 200 ng BSA (New England Biolaboratories, Beverly, MA), and a l μl of DNA solution. After initial enzyme activation at 94 ⁰C for 10 min, 35 cycles of the following program was used for amplification: denaturation at 94 ⁰C for 60 s, annealing at 53 ⁰C or 58 ⁰C for 60 s, elongation at 72 ⁰C for 60 s; followed by a final elongation at 72 ⁰C for 7 min. PCR mixtures were purified using a QIAquick PCR purification kit (Qiagen), and amplification products quantified with a NanoDrop ND- 1000 spectrophotometer (NanoDrop Technologies). Fluorescent labeling of DNA targets (PCR products). PCR products pooled from different PCR reactions were labeled according to an amino-allyl dUTP (aa-dUTP) incorporation and cyanine-dye coupling protocol as described (Stedtfeld, et al., 2007. Appl. Environ. Microbiol. 73:380-389, 58; Wick, et al., Nucleic Acids Res. 34:e26, herein incorporated by reference) with minor modifications. Klenow reactions were incubated for 2 hr instead of 90 min, and dye coupling reactions were incubated for 80 instead of 60 minutes. Briefly, PCR products were amplified and aa-dUTP was incorporated in the amplification products with the Bioprime DNA labeling kit (Invitrogen, San Diego, CA) using a 5: 1 aa-dUTP:dTTP ratio and an incubation time of 120 min. Purified products were then coupled with cyanine dye by incubation for 80 min in a 1 : 1 mixture of 0.1 M sodium carbonate buffer (pH 9.3) and jV-hydroxysuccinimide ester cyanine dye (Amersham Biosciences).

Microarray hybridizations and melting curve profiles. Hybridizations were performed and melting profiles obtained as described by Wick, L. M., et al., Nucleic Acids Research, 2006. 34(3), herein incorporated by reference. Replicates on one chip were run on consecutive days to minimize storage of used chips. Briefly, after priming of the chips, target DNA (200 pmol of Cy dye) was hybridized overnight at 20 ⁰C using the M-2 hybridization station (Invitrogen, formerly Xeotron Corporation, Houston, TX) with the following hybridization buffer: 35% deionized formamide (Ambion), 6^χ SSPE (pH 6.6, Invitrogen) and 0.4% Triton X-100 (Sigma). After washing of the chips, the initial point of the melting curve profile was obtained by washing the chip with high stringency wash buffer (20 mM NaCl, 10 mM Na₂PO₄, 5 niM Na₂EDTA, pH adjusted to 6.6 with HCl) for 1.4 min at 25 ⁰C, and imaging of the chip. High stringency wash buffer was degassed under vacuum to prevent formation of air bubbles during the wash steps. Subsequent development of the melting profile was performed by manual cycles of washing and scanning of the chip, repeated at 1 degree intervals until 60 ⁰C was reached. At the end of this series, the chip was additionally stripped for 2.5 min each at 60, 45 and 30 ⁰C with nuclease-free water (Sigma). The tubing of the hybridization station was washed for 20 min before and after each experiment to prevent carry-over between experiments. Data acquisition. Hybridization signal intensities were obtained from scanned images using GenePix 5.0 (Axon Instruments, Union City, CA). Fluorescence signal intensities were extracted from the scanned images using GenePix5.0 (Axon Instruments, Union City, CA), yielding values between 0 and 65,535 arbitrary units (a.u.). The median of all pixel intensities within a spot was used as raw spot intensity. Subsequent data analysis was done with Microsoft Excel (Microsoft, Redmond, WA), and plotting with SigmaPlot 9.0 (Systat Software, Point Richmond, CA). Raw spot intensities were divided by the mean intensity of 22 empty control spots to yield signal-to-noise ratios (SNRs) (Stedtfeld., et al., Appl. Environ. Microbiol. 73:380-389, herein incorporated by reference). The SNR was computed for each wash step between 30 and 45 °C. An overall SNR was subsequently obtained by averaging all SNR within this interval to include variation in both signal strength and probe-target dissociation behavior. This estimation is equivalent to calculating an overall SNR based on the area under the melting curve within this temperature interval. The SNR for all probes was normalized by dividing by the median SNR of the 2,034 not-targeted probes. The SNR obtained after this normalization step served as the basis for positive/negative calls, with probes yielding a normalized SNR equal or greater than 3 considered positive. In analysis where averaged SNR among replicates were required, SNR were additionally normalized to the replicate with the median SNR (for triplicate experiments), or the replicate with the lowest SNR (for duplicate experiments) using the slope of the linear regression line between replicates as normalization factor. The PF was computed for each probe set by dividing the number of probes with SNR > 3 within a set by the number of probes within that set (Wilson et al., MoI. Cell. Probes 16: 1 19-127, herein incorporated by reference).

Preparation of environmental samples, and set-up for hybridization. The power of the VMG chip to detect pathogens was tested three different background samples, tap water, river water, and tertiary effluent from a waste water treatment plant. A mixture of genomic DNA consisting of all 12 pathogens was spiked in DNA extracted from each background. The spike consisted of 0.1% each pathogen (10 pg pathogen DNA, approximately 3,000 genome copies assuming a genome size of 3 MB, in 10 ng background DNA). Spiked samples and unspiked samples were enriched using multiplex PCR assays. The spiked samples were labeled with Cy3 and the unspiked samples were labeled with Cy5. Both samples were mixed, and hybrized on the same VMG chip in duplicate without a dye swap. The above experimental design was also used to assess DNA from the 12 pathogens at an abundance of 0.01% in DNA extracted from river water and an abundance of 0.001% in DNA extracted from tertiary effluent. Experimental design. The specificity and sensitivity of the assays was evaluated using pathogen genomic DNA spiked into DNA obtained from different water sources (tap water, river water, and tertiary effluent from a wastewater treatment plant). A first set of samples was prepared by spiking 10 pg of genomic DNA of each pathogen (equivalent to approximately 1,400 to 5,500 genome copies depending on the pathogen) into 10 ng of DNA from the different water samples, yielding a relative abundance of 0.1% for each of the pathogens. Both spiked and unspiked samples were enriched using the multiplex PCR assays. The amplicons prepared from spiked water samples were labeled with Cy3 and the amplicons from unspiked water samples were labeled with Cy5. Equal amounts of both samples (100 pmol of each dye or -2.5 μg of DNA) were mixed and hybridized in duplicate. For the unspiked water samples, 2,034 not-targeted probes were evaluated to address assay specificity. For the spiked samples, 673 probes (see Results and Discussion for information on selection of probes) were examined to address assay sensitivity. A second set of samples was prepared to assess the detection limit of the assay. This set contained samples with pathogens spiked at a relative abundance level of 0.01% in river water and 0.001% in tertiary effluent.

Data analysis. All data analysis was performed using Excel (Microsoft, Redmond, WA), Systat 1 1 and SigmaPlot 9.0 (Systat Software, Point Richmond, CA). Signal-to-noise ratio (SNR) was calculated for each probe using the mean signal intensity of 22 control spots (empty wells) as noise (approximately equal to XXX). For each scan, temperature-dependent SNR were calculated as the ratio of the spot intensity and background signal. An overall SNR was calculated using the area under the melting curve (AUC) between 30 and 45 ⁰C by averaging the SNR measurements in this interval. Each SNR was normalized by dividing individuals SNR by the median SNR of all 2,034 non- targeted probes. A signal was considered positive when its intensity was at least three times the background signal intensity (SNR>3). In cases where averages of SNR among replicates were needed, the SNR were additionally normalized to the replicate with the median SNR (for triplicate experiments), or the replicate with the lowest SNR (for duplicate experiments).

The metric for presence/absence calls at the VMG level was the positive fraction. See Wilson, W.J., et al., Sequence-specific identification of 18 pathogenic microorganisms using microarray technology. Molecular and Cellular Probes, 2002. 16(2): p. 1 19-127, herein incorporated by reference. The positive fraction was calculated by dividing the number of probes designed for the VMG by the number of probes with positive signal for the VMG. VMGs with positive fractions of 0.5 or higher were scored as present. Gibbs free energy calculation. Gibbs free energy of probe-target duplex formation (ΔG°_dUpi_ex) was calculated with the DINAMeIt web server (Markham, et al, DINAMeIt web server for nucleic acid melting prediction. Nucleic Acids Research, 2005. 33: p. W577-W581, herein incorporated by reference). This server simulates hybridization behavior of nucleic acids in solution using a general statistical mechanics approach developed by Dimitrov, R.A. and M. Zuker, Prediction of hybridization and melting for double-stranded nucleic acids. Biophysical Journal, 2004. 87(1): p. 215-226, herein incorporated by reference. A temperature of 43⁰C was used instead of the actual 2O⁰C hybridization temperature to account for the presence of 35% formamide in the hybridization solution. See Blake, R. D. and S. G. Delcourt, Thermodynamic effects of formamide on DNA stability. Nucleic Acids Research, 1996. 24(1 1): p. 2095-2103; Urakawa, H., et al., Single-base-pair discrimination of terminal mismatches by using oligonucleotide microarrays and neural network analyses. Applied and Environmental Microbiology, 2002. 68(1): p. 235-244, all of which are herein incorporated by reference). The Na+ concentration used was 1 M.

II. Evaluation of the VMG chip with amplified targets.

Probe success rate was evaluated through hybridization of a target mixture containing all 47 VMG amplicons obtained by individual PCR amplifications. Of the 47 amplicons, 24 were labeled with Cy3 and 23 were labeled with Cy5, Miller et al., An In- situ Synthesized Virulence and Marker Gene (VMG) Biochip for the Detection of Bacterial Pathogens in Water, in preparation for submission, herein incorporated by reference).

A SNR of 3 was used as a threshold for positive signals. The total number of probes (2,825) was divided into targeted (720) and non targeted probes (2,105). Twenty four VMG targets were labeled with Cy3 and twenty three VMG targets were labeled with Cy5 (Table 1). Both samples were mixed and hybridized to the VMG chip. Using this experimental set-up, 5,650 possible hybridization events could be analyzed. Evaluations consider probes presented with target (targeted probes) and probes not presented with target (non-targeted probes) separately.

Table 1. Gene-specific primer sets we _proe_sre designed for 35 targeted VMGs and tested on water samples. _S/B _3b > _tt oo_sa _{i pr}e_r b a_ine_y m

Environmental Samples

I D _S/B _3b > _tt o _{i pr}oe_sa_r Present/absent calls and PF in spiked water

No Pathogen VMG amplicon _PC_{d bb}R _tt a o_pex_ine_y m_i samples a

&

33 A hydrophila alt 8 6 5 + ⁰%^{i1t r} wae^{r r}v

+ (0 60) (1 00) (1 00) - (0 20)

8 asl( l ) 15 13 12 + +

+ (0 7I) (0 79) (0 71 ) + (0 58)

I ast(2) 10 9 7 + +

+ (0 86) (1 00) (1 00) - (0 07)

36 C jejuni cad F 25 21 21 + + +

(0 67) (0 79) (0 74) (0 26)

22 cdtA 14 13 13 + + + +

(0 92) (0 85) (0 92) (0 77)

46 cdtC 31 22 21 + + + +

(0 74) (0 88) (0 88) (0 52)

3S hipO 14 12 12 + + + -

(0 75) (0 83) (0 79) (0 38) rfrmgens cpe(l ) I l 9 9 - _

39 C pe

(0 00) (0 00) (0 11 ) (006) cpe(2) 9 8 7 - - -

(029) (007) (007) (000) plc(l ) 17 14 14 - - -

(004) (007) (007) (014) plc(2) 16 11 Il _ - - -

(009) (027) (036) (018)

// py/oπ flaA(l) 18 18 18 + + + +

(089) (092) (100) (075) flaA(2) 8 8 7 + + + -

(l oo) (100) (100) (021) ureB(l) H IO 10 + + + +

(070) (085) (075) (05) ureB(2) 23 23 23 + + + -

(091) (087) (087) (028)

L monocytogenes inlA 19 19 19 + + + —

(097) (089) (100) (045) inlB IO 9 9 + + + -

(089) (078) (067) (033) hsO 17 13 Il + + + -

(055) (073) (055) (027) plcA 32 21 16 + + + -

(084) (063) (056) (013) plcB 18 15 15 + + + -

(087) (087) (083) (043)

£ pneumophila dnaJ(l) 17 16 16 + + +

(078) (075) (078) (028) dnaJ(2) 17 17 16 + + + +

(081) (075) (075) (053)

Mip 8 7 7 + + + -

(071) (057) (057) (043)

P aeruginosa oprL 18 18 16 + + +

(1 00) (094) (100) (028)

Salmonella finiA 21 21 20 + + + +

(1 00) (098) (100) (093) hιlA(l) 12 12 12 + + + +

(1 00) (100) (100) (075) hιlA(2) 23 21 21 + + + -

(090) (095) (090) (045) invA 15 15 12 + + + -

(092) (092) (058) (033) ιnvC(l) 18 18 17 + + + -

(097) (091) (065) (038) 3Q ιnvE(2) 9 9 9 + + + (0 56) (0 72) (0 61 ) (0 22)

43 5 aureus nuc(l) 10 10 9 + +

(0 44) (0 50) (0 56) (0 0)

38 nuc(2) I l 10 10 + + + (0 60) (0 60) (0 65) (0 3)

42 seC 12 9 8 +

(0 38) (0 63) (0 38) (0 19)

34 tssl(l ) 17 17 17

(0 35) (0 38) (0 29) (0 12)

44 tsst(2) 13 10 10 + + + (0 50) (0 65) (0 70) (0 15)

40 V cholerae ctxB 20 20 2

\ X X X

15 hlyA 15 15 15 + + + (0 94) (0 97) (0 94) (0 47)

5 ompU 18 18 18 + + + (0 79) (0 82) (0 53) (0 21 )

16 toxR 35 31 31 + + + (0 92) (0 90) (0 85) (0 40)

3 ZOt 31 31 31 + + + (0 79) (0 95) (0 79) (0 45)

V

19 parahaemolyticus ldh 20 20 19 + + + + (0 92) (0 97) (0 95) (0 71 )

7 UIi( I ) 19 19 19 + + + (0 92) (0 92) (0 73) (0 38)

20 tlh(2) 13 13 13 + + + (0 92) (0 92) (0 95) (0 45) 25 toxR( l) 20 20 19 + + + + (1 00) (1 00) (1 00) (0 50)

12 to\R(2) 12 12 12 + + + (0 62) (0 87) (0 70) (0 30)

I O Y enlerocolilica ail 32 30 30 + + + (0 67) (0 67) (0 75) (0 33) 1 ystA 7 7 6 + + + (0 60) (1 00) (1 00) (0 20)

Total 791 720 673 The experiment described above was performed using 24 amplified regions labeled with Cy3 and 23 amplified regions labeled with Cy5 to better evaluate the specificity and behavior of the targeted probes (Table 1). Of the 393 probes presented with their Cy3 labeled targets, 337 (85.5%) displayed positive signals in at least three replicates. Furthermore 10 out of the 393 (2.5%) probes displayed positive signals in at least one of the replicates with the Cy5 labeled amplicons with SNR always less than 4.4 (median 3.4). Of the 398 probes presented with their Cy5 labeled targets, 383 (96.2%) yielded positive signals in at least three replicates. Furthermore, only 18 of these 383 (4.5%) probes displayed positive signals with the Cy3 labeled non-target sample. Of the 2,034 non-target probes, 27 (1.3%) and 34 (1.7%) probes displayed positive signals with the Cy3 and Cy5 labeled non-target samples, respectively. The SNR of these non-specific hybridization events were considerably smaller than the signals observed for probes presented with targets. SNR for the latter ranged from 3 to 1000 while of the non-target probes only 8 probes yielded SNR higher than 5. The good separation between the signal distributions for targeted and non-targeted probes demonstrates the specificity of the probes (Fig. 2B). Overall, of the 791 targeted probes, 720 (91.0%) produced positive signals with SNR up to 1,000. Of the 2,034 not-targeted probes, 61 probes (3.0%) yielded positive signals with either the Cy3 or Cy5 labeled amplicons. The SNR for targeted and not-targeted probes were well separated, except for a small fraction of the probes (Fig. 2b), indicating the good discriminatory power of the probes. Taken as a whole, out of the 5,650 interrogated individual hybridization reactions, 95% behaved as expected in terms of positive/negative hybridization signal.

A positive fraction (PF) was calculated each probe set (and corresponding amplicon) by dividing the number of positive probes by the number of interrogated probes within that set. A PF obtained based on the number of initially designed probes is equivalent to the success level of probe selection for individual probe sets. Of the 47 probe sets presented with targets, 22 sets displayed a positive fraction of 1 , indicating that 100% of the probes in that set yielded positive signal. Eighteen sets displayed a positive fraction between 1 and 0.8, and 7 sets displayed a positive fraction between 0.8 and 0.6 (Fig. 2C). Wilson et al. (Wilson, W.J., et al., Sequence-specific identification of 18 pathogenic microorganisms using microarray technology. Molecular and Cellular Probes, 2002. 16(2): p. 1 19-127) also observed positive fractions between 0.8 and 1 for 142 probe sets when hybridized with amplified targets. Of the 67 non-targeted probe sets, 34 sets displayed a positive fraction of 0, and 31 sets displayed a positive fraction between 0 and 0.1. The two highest positive fractions for non-targeted probe sets were 0.125 and 0.3. Targeted VMGs displayed a positive fraction above 0.5 while the non-targeted VMGs displayed a positive fraction much lower than 0.5. Accordingly, a positive fraction of 0.5 was adopted as a threshold for presence/absence calls at the gene level. A plot of the number of positive probes vs. the number of designed probes also demonstrates the excellent separation between the positive fractions for targeted and not-targeted probe sets (Fig. 2D). It is also evident that the positive fraction was not related to the number of designed probes. High success rates (>90%) for the design of oligonucleotide probes have been described in numerous recent studies for 16S rRNA genes and sets of functional genes, and demonstrates that current probe design approaches are adequate and successful. Substantial differences in SNR were observed among probes targeting the same

VMG (Fig. 3B). For probe sets, differences up to a 100- to 500-fold were observed. When sorting the probe sets according to their median SNR, an overall trend of increasing positive fraction with higher median SNRs becomes apparent (Fig. 3A). The G+C content of the amplified VMG region displayed a similar trend (Fig. 3C). This trend also explained, at least in part, the overall lower success level for the probe sets hybridized with the Cy3 labeled amplicons (85.5%) in comparison with probe sets targeted by the Cy5 labeled amplicons (96.2%). The G+C content for Cy3 labeled amplicons (36.5 ± 5.4%) was, on average, lower than the G+C content of the Cy5 labeled amplicons (48.6 ± 7.9%). This also implies that probe design for genes or genomes with a considerably low overall G+C content may yield lower success rates if continuous regions with higher G+C content are not present.

Multiplex amplification of VMG targets. The use of multiplex PCR can greatly increase the efficiency of sample analysis, but it can also introduce bias. This bias was examined by hybridizing targets produced in multiplex PCR and comparing the VMG chip response to that obtained with individually amplified gene regions. Out of 720 probes giving positive signal in response to individually amplified targets, 673 yielded positive signals in response to multiplex amplified targets (Table 1). Twenty eight of the probe sets responded the same to both individually amplified and multiplex amplified targets. Thirteen probe sets displayed a one probe difference, three sets displayed a two probe difference, one set displayed a three probe difference, and one set displayed a five probe difference between individually and multiplex amplified targets. The probes that did not respond to multiplex targets displayed relatively low SNR in response to individually amplified targets (data not shown). Eighteen of 20 probes for the ctxB gene of V. cholera ctxB displayed a differential response to multiplex amplified targets. This gene regions was not amplified to an abundance detectable via the VMG chip in multiplex PCR. Only ctxB of V. cholerae was not amplified to a level detectable by the microarray, and displayed a positive fraction of 0.2 (2 positive out of 20 probes).

Success level of probes as function of ΔG°d_upiex_' The use of hybridization free energy (ΔG0) was evaluated as a parameter to facilitate in silico probe selection. 791 targeted probes were sorted according to ΔG°d_upieχ, binned (40 probes per bin), and the percent positive probes in each bin calculated. The latter is equivalent to the success level for probes with ΔG°_dupi_ex within the bin range. A drastic decrease in the success rate was observed for probes with a ΔG°_dupi_ex less negative than -17 kcal/mole (Fig. 4), and a corresponding G+C content lower than of 34.4%. Similar analysis with variable SNR thresholds displayed analogous trends, but were shifted towards more negative ΔG°_dupi_ex for increasing SNR thresholds (not shown). The factors contributing the unevenness in probe signals for a given target may explain the lack of hybridization signal for probes with a highly negative ΔG°_dupi_ex- Therefore, -17 kcal/mole might be a useful threshold to increase the success rate of probe selection. Similarly, Loy et al. (Loy, A., et al., 16S rRNA gene-based oligonucleotide microarray for environmental monitoring of the betaproteobacterial order "Rhodocyclales". Applied and Environmental Microbiology, 2005. 71(3): p. 1373-1386.) proposed a threshold of -16 kcal/mole for 18-mer oligonucleotide probes targeting 16S rRNA genes. Other studies also demonstrated the utility of ΔG0 as a probe selection parameter. See Matveeva, O. V., et al., Thermodynamic calculations and statistical correlations for oligo-probes design. Nucleic Acids Research, 2003. 31(14): p. 421 1-4217; Luebke, K.J., R.P. Balog, and H.R. Garner, Prioritized selection of oligodeoxyribonucleotide probes for efficient hybridization to RNA transcripts. Nucleic Acids Research, 2003. 31(2): p. 750-758. The selection criteria mentioned above were based on hybridization patterns observed with high abundance and low complexity samples. Rules derived from such experiments are not necessarily stringent enough to be applied to work with environmental samples. Rules for such applications are discussed below. Furthermore, small deviations from the proposed optima might occur for other hybridization platforms and experimental conditions.

The results of inventors 'experiments display a weak albeit significant correlation between probe hybridization ΔGO and SNR (data not shown), as do the studies of Matveeva et al. and Luebke et al. This weak correlation and the large variability within each ΔGO bin described above caused some to reject probe hybridization ΔGO as a probe selection parameter. See, Pozhitkov, A., et al., Tests of rRNA hybridization to microarrays suggest that hybridization characteristics of oligonucleotide probes for species discrimination cannot be predicted. Nucleic Acids Research, 2006. 34(9). The inventors' results show that probe hybridization ΔGO does have value as a selection parameter, it can be used to predict the behavior of groups of probes with accuracy (Fig. 5). Furthermore, strong hybridization signals were mainly observed by the inventors for probes with a highly negative

Based on the analysis of responsive and non-responsive probes, a ΔG°d_upi_eχ threshold of -17 kcal/mole serves as a valuable design rule for future oligonucleotide probe (18-mers) screening exercises. This recommendation is in agreement with Loy et al. who proposed a ΔG°_dupi_e\ threshold of -16 kcal/mole for 18-mer probes targeting the 16S rRNA gene. The latter was suggested based on the observation that the majority of probes displaying positive signal (including cross-hybridization signals) were attributed to duplexes with a ΔG°_dupi_e\ more negative than -16 kcal/mole. It should be noted that small deviations from the above ΔG°d_Upiex thresholds may be observed for hybridizations performed under different experimental conditions and/or for theoretical ΔG°d_Upi_eχ estimates using different nearest neighbor model parameters. Nevertheless, interpreting the -17 kcal/mole ΔG°_dupi_e\ threshold for 18-mers in terms of their inferred G+C content (34.3%) suggested that the derived values might be applicable to most studies. It should be recognized, however, that the above ΔG°d_upi_e\ criterion was derived from hybridization patterns observed with high abundance and low complexity target mixtures. Hence, probe design rules derived from such hybridizations may not be stringent enough to be applied to nucleic acid target mixtures obtained from environmental samples, containing low abundance target sequences in the presence of excess of non-target sequences. Performance of the VMG chip with environmental samples. The performance of the VMG biochip probe sets was further validated by hybridization the biochip with PCR amplification products generated from pathogen genomic DNA spiked in total DNA extracted from three different water samples: tap water, tertiary effluent from a wastewater treatment plant, and river water. In the following analysis, PFs for targeted amplicons were calculated based on the 673 probes selected after hybridization of amplicon mixtures from multiplex PCR.

Genomic DNA of 12 pathogens was spiked at an abundance of 0.1% into DNA obtained from three background samples: tap water (drinking water), river water and waste water treatment plant tertiary effluent. Eleven pathogens were detected in all three backgrounds, with all of the gene regions for these organisms indicating presence (Table 1). The response of S. aureus probes indicated presence in all three backgrounds, although the gene regions were in disagreement between background samples. The gene regions of C. perfringens were not detected in any of the background samples. This is likely due to the low G+C content of the C. perfringens probe sets.

The median positive fractions of the 46 targeted probe sets were 0.8, 0.82, and 0.79 in tap water, tertiary effluent, and river water, respectively (Fig. 6). This precision in three different background samples illustrates the robustness of the chip. The median positive fraction for the 67 non-targeted gene regions were 0.03, 0.09, and 0.18 for the three backgrounds, with tap water resulting in the narrowest distribution of non-targeted fractions and river water resulting in the widest. This could be due to the relative complexity of the background samples, with a greater amount of cross hybridization expected in samples containing greater diversity of target sequence. Using a positive fraction threshold of 0.5, the number of detectable gene regions was 39, 40, and 37, again illustrating the robustness of the chip. Positive fractions presented in the preceding analysis were based on the 673 probes that produced positive signals in response to multiplex amplified targets.

Six pathogens were detected in river water background DNA when spiked at an abundance of 0.01%. Even though these six organisms were detected, no more than half of the probe sets targeting each organism indicated presence. This illustrates the utility in targeting multiple gene regions in the detection of organisms present at low abundance. These detection limits are comparable to those obtained in studies by Maynard et. al. [15] and Wilson et. al. Maynard et al. (Maynard, C, et al., Waterborne pathogen detection by use of oligonucleotide-based microarrays. Applied and Environmental Microbiology, 2005. 71(12): p. 8548-8557.) were able to detect Salmonella at a relative abundance of 0.1% by targeting. In a study similar to one described here, Wilson et al. was able to detect in a background of DNA obtained from air samples based on a positive fraction threshold of 0.8. Other studies described lower detection limits, albeit in different matrices. Pathogen templates spiked into tertiary effluent background DNA at an abundance of 0.001% resulted in no detection.

Hybridization free energy range associated with best probe selectivity. Probe selectivity, defined as the difference between the percent for the targeted and non- targeted probes, is presented in Fig. 6 as a function of ΔG0. The highest selectivity of 80% was obtained for probes with a ΔG0 of approximately -19.3 kcal/mole. Probes with a ΔG0 higher or lower than this value displayed lower selectivity. More than 70% selectivity was obtained for probes with a ΔG0 between approximately -18.6 and -21.1 kcal/mole, which correlates to a G+C content of 47.2±5%. Two distinct ΔG0 regions were evident: region 1 with a ΔG0 lower than -19 kcal/mole and region 2 with a ΔG0 higher than -19 kcal/mole. Moving from the lowest ΔG0 to -19 kcal/mole, the positive bin fraction of the targeted probes is consistent at 0.9 in region 1 while the positive bin fraction of the non-targeted probes decreases with a slope of -0.07. Moving from -19 kcal/mole to higher ΔG0, the positive bin fraction of the targeted probes decreases with a slope of -0.17 in region 2 while the positive bin fraction of the non-targeted probes remains relatively constant.

A probe having a more negative ΔG0 is more likely to give a positive signal, but it is also more prone to cross hybridization with non-targets (Held, G. A., G. Grinstein, and Y. Tu, Modeling of DNA microarray data by using physical properties of hybridization. Proceedings of the National Academy of Sciences of the United States of America, 2003. 100(13): p. 7575-7580 and Fig. 7 top panel). In contrast, a probe having a more positive ΔGO has a low tendency to hybridize with non-targets as well as a lower likelihood to give a positive signal. Starting from the optimal ΔGO of -19.3 kcal/mole and moving in the more negative direction, the increase in cross hybridization positive bin fraction is less drastic than the decrease in target hybridization positive bin fraction as you move in the more positive direction. This means that if there is a choice between selecting a probe that has a ΔGO of -20 or -18 kcal/mole, the former is more likely to give a higher selectivity. Consequently, it is more difficult to obtain quality probes for and detect lower G+C content regions than higher G+C content regions.

For this analysis, probes were sorted according to their ΔGO values, and the fraction of probes with positive signal was calculated for bins of 40 targeted probes and bins of 80 non-targeted probes (Fig. 6 top panel). The hybridizations using tap water as a background resulted in very little non-targeted probe signal (cross-hybridization), so the non-targeted probes from that background were not included in this analysis. The targeted probes from all three background experiments were assessed. Probe selectivity, defined as the capability of a probe to detect intended targets with the exclusion of non-targets, is the parameter that needs to be optimized during probe selection. Probe selectivity was accordingly calculated as the percent of targeted probes giving positive signal minus the percent of non-targeted probes yielding positive signals. A peak shaped profile was observed when plotting the selectivity as function of ΔGO (Fig. 5B). The highest selectivity (approximately 80%) was observed for probes with a ΔGO of -19.3 kcal/mole. Probes with a ΔGO different from this optimum displayed lower selectvities. Higher than 70% selectivity was observed for probes with a ΔGO between -18.6 and -21.1 kcal/mole. Using a relationship between a probe's G+C content and its ΔGO, this amounts to a G+C content range between 43.4 and 56.3%, with an optimum at 47.2%. For probes with more negative ΔGO values, the decreased selectivity was due to the increased tendency of such probes to cross-hybridize with non-targets (Fig. 4A). For probes with less negative ΔGO values, the observed decrease in selectvity was explained by the decrease in percent positive probes for targeted probes (Fig. 4A). Finally, it is interesting to observe that the decrease for probes with less negative ΔGO values was more pronounced than that for probes with more negative ΔGO values. Two aspects need to be recognized when interpreting the results in Fig. 5. First, the amount of cross-hybridization depends on the sample that is hybridized. For this analysis, results from river water and tertiary effluent were combined with exclusion of the results from tap water since only minimal cross-hybridization was observed with this background. Second, the drastic decrease in percent positive probes for targeted probes will be less pronounced for more abundant targets.

Hybridization free energy was shown to be a valuable parameter for the selection of promising probes. Probe selectivity was highly depended on ΔGO, and a maxium was observed for probes with a ΔGO of -19.3 kcal/mole. The capability of theoretical (solution-based) models to predict hybridization signals on microarrays, and their applicability in facilitating probe selection remains in debate. See Loy, A., et al., 16S rRNA gene-based oligonucleotide microarray for environmental monitoring of the betaproteobacterial order "Rhodocyclales". Applied and Environmental Microbiology, 2005. 71(3): p. 1373-1386; Pozhitkov, A., et al., Tests of rRNA hybridization to microarrays suggest that hybridization characteristics of oligonucleotide probes for species discrimination cannot be predicted. Nucleic Acids Research, 2006. 34(9); Pozhitkov, A., et al., Tests of rRNA hybridization to microarrays suggest that hybridization characteristics of oligonucleotide probes for species discrimination cannot be predicted. Nucleic Acids Research, 2006. 34(9), herein incorporated by reference. Although it is recognized that other factors, besides ΔG°d_Upi_ex_> determine the signal strength of hybridization events, the inventors propose that ΔG°d_upiex is a valuable parameter that can be used to increase the success rate of in silico probe selection. In conclusion, the inventors developed and validated a coupled format of multiplex PCR and an in situ synthesized DNA biochip for the detection of 12 bacterial pathogens in various water samples. By using redundant probe sets targeting various VMGs and inference of presence/absence calls based on the PF, false-positive calls could be eliminated. Pathogens could be detected at a relative abundance level of 0.1% to 0.01%, depending on the pathogen and VMG. Analysis of the hybridization patterns also yielded optimal probe selection criteria for short oligonucleotides. Finally, the relatively high overall success rate of primer and probe design observed in this work, along with implementation of the optimized probe design criteria, indicates that unvalidated VMGs are contemplated for use to detect untested pathogens. Example 2

This example describes on-chip PCR using primers identified and designed according to the present invention. DNA Targets. Genomic DNA from 17 bacterial pathogens (Table 1 ) was used as the source of virulence and marker genes. A. hydrophila, C. perfringens, E. faecalis, L. monocytogenes, L. pneumophila, P. aeruginosa, S. enterica, V. parahaemolyticus, Y. enterocolitica type strains were obtained from the American Type Culture Collection (ATCC; Manassas, VA) and grown as per the protocol provided. For H. pylori, C. jejuni, C. parvum, G. intestinalis, K. pneumoniae, S. aureus, V. cholerae, only genomic DNA was obtained from the ATCC. E. coli was kindly provided by Dr. Thomas Whittam at Michigan State University. DNA from pure cultures was extracted using Promega Wizard DNA Extraction Kit (Promega, Madison, WI).

Genomic DNA Background Samples. Tap water (drinking water) (40 L), river water (10 L) from the Red Cedar River in East Lansing (MI), and tertiary effluent (20 L) from a local wastewater treatment plant (East Lansing, MI). Samples were filtered through 0.45 μm nitrocellulose filters (Millepore, Billerica, MA) immediately after collection. Genomic DNA was extracted from the filters as instructed with the MegaPrep UltraClean™ Soil DNA Kit (Mo Bio Laboratories, Carlsbad, CA). Genomic DNA was purified further through ethanol precipitation, followed by final purification with the QIAquick PCR purification kit (Qiagen). DNA quantification was performed with the NanoDrop® ND- 1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE). All DNA quantification was performed with the NanoDrop® ND- 1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE). PCR Primer Design. Ninety-six virulence and marker genes (VMG) from 20 human pathogens were targeted. Primer design consisted of the following: i. Kodon (Applied Maths, Austin, TX) was used to generate 96 consensus sequences for each VMG (>4 genes for each organism), ii. Primer Express (Applied Biosystems, Foster City, CA) was used to design primers from the consensus sequence. A majority of designed amplicons had a maximum length of 150 bases and Tm of 59 ⁰C. Some genes required longer amplicons (<250 bases) for generating acceptable primer pairs, iii. Designed primers were BLASTed against Genbank database using a script that automatically highlighted unspecific primers. Specificity was based on the extent of 3' end perfect matches to other bacterial sequences. Sequences were selected manually based on results of BLAST output, iv. When available, primers described in the literature 5 for successful quantitative PCR were also used. Overall, 110 primers pairs were designed and extracted from the literature.

PCR on BioTrove OpenArray. Primers were synthesized by Sigma-Aldrich (St. Louis, MO) and pre-loaded (128 nm) into BioTrove OpenArray plates (Woburn, MA). Two subarrays (each having 64 wells for 56 assays) were used for each PCR reaction

10 sample. PCR reaction mixtures (10 μl per sample or 5 μl for each sample array) consisted of 1 x LightCycler Fast Start DNA Master SYBR Green I Kit (Roche Applied Sciences, Indianapolis, IN), 1.6 x SYBR Green I, 0.5% Glycerol, 0.2% Pluronic F-68, 1 mg per ml BSA (New England Biolaboratories, Beverly, MA), 2.5 mM MgCl₂, 8% formamide, and a DNA mixture.

15 Experimental Design. Experimental samples consisted of the following: 1)

Genomic DNA from 17 pathogenic organisms (pure cultures) was tested individually (6 ng in total sample or 20 pg per reaction well). 2) Genomic DNA from 4, 8, and 17 pathogenic organisms was mixed and tested simultaneously (20 pg per reaction well per organism). 3) Genomic DNA from 17 pathogens was mixed together and spiked at

20 various concentrations into background genomic DNA from waste water tertiary effluent and river water samples (20 pg, 2 pg, 200 fg, 20 fg, 2 fg of each of 17 pathogens mixed together spiked into 66.6 pg of background genomic DNA per reaction well). Samples were tested in triplicate.

25 Table 2: List of Pathogens Selected for On-Chip PCR Methods Development.

Genus Species Marker genes Designed Primer ATCC

Targeted and No. of sequences Primer pairs Organisms used to make consensus pairs from tested with sequence for primer design literature Biotrove

Aeromonas hydrophda, alt, exeF, tapA, arcV 7966 Bnrkholdena mallei, pseudomallei pilA, pilD, orfl 3, orfl l ,

SU 63 I OMP

Campylobacter jejuni, hipO, racR, mapA, gyrA, cdtC,

700819 cdlB, cdlA

Clostridium perfnngens cpe, pic, pfo, cpbsl , etx 12916

Enterococcus faecahs, faecium cylA, ace, cylLS, esp, gelE 19433

Escherichia coli uidA, stx 1 , st\2, eae, papG from lukas (including Shigella) 0157 H7

Helicobacter pylori cagA, cagE, ureA, virD4 3 2 700392

Klebsiella pneumoniae nuc, magA, kvgS, kci, kca 5 - 700721

Legionella pneumophila lepB, lcmQ, lepA, imp 4 1 33152

Leptospira interrogans hlyB, lipA, chpl, Iιp21, ompLl 4 1 -

Listeria monocytogenes actA, hlyA, inlA, mpl, plcB 4 1 15313

Mycobacterium MAC paratuberculosis, erp, glnA l , mmpA, mmpB,

MTB tuberculosis, 1S900 4 1 - leprae

Pseudomonas aeruginosa exoS, iasA, pcrV, pilDXcpA, 5 10145 popD

Salmonella salmonella fimA, flicC, fljB, invA 3 1 1331 1

Staphylococcus aureus alphahly, seA, seC, tsstl 3 1 700699

Vibrio cholerae, mimicus, ace, ompU, tdh, tlh, toxR, ctxA, 39315 parahaemolyticus, ctxB, mshA, tcpA, zot, 9 2 vulnificus 43996

Yersinia enterocolitica, pestis, cφ, yopD, yscD, ystsA, bipA 4 Z pseudotuberculosis

Cryptosporidium parvum, hominis gp40, hsp70, COWP, cp23, cgd7 8 ] PRA-67 genotype Il

Giardia intestinalis, lainblia β-giardm, VSPH71 , VSP4177, « 30888

L

VSP4173 A- I

Total 20 96 (88 + 22) = 1 10 17

Example 3.

This example describes developmental stages of micro fluidics systems for use in detecting pathogens using PCR primers, 20 mer and 50 mer PCR oligonucleotide probes 5 designed and used according to methods of the present invention. Further, this example demonstrates the use of these oligonucleotide probes in combination with micro fluidic and serpentine chips (for example, see, Figures 10 and 1 1) for PCR reactions, (Hashsham, et al., Microbe, Volume 2, Number 1 1 , 2007, herein incorporated by reference) (Figure 14)

10 Microfluidics-based assays were used for detecting and quantifying infectious agents by hybridizing PCR amplified products onto oligonucleotide probes. For example, the inventors developed and validated a chip (Fig. 1 1) containing 8,000 microreactors, each with a diameter of 50 microns. Each reactor had oligonucleotide probes synthesized in situ using a low-cost, light-directed DNA synthesis technology. The chip was used to

15 screen 20 different pathogens per run, based on their respective virulence and marker genes.

Presence of each pathogen was confirmed by targeting 3-6 genes per organism. Positive signals were confirmed by hybridizing amplicons for each gene to 5 - 20 standardized probes. (FIG. 1 Id). This method was found to lower false positives rather than standard approaches that rely on a single marker. (FIG. 14B).

More specifically, the inventors used methods of providing probe sequences, as described herein, for generating 50 mer probes (SEQ ID Nos. 792-1 1533, Figure 12. These sequences underwent initial evaluations, wherein at least SEQ ID Nos. 1 1534-

13225 and SEQ ID Nos. 13226-19206 were validated (Figure 13) for hybridizing to test pathogen sequences (Figure 14A and 14B).

One of the most challenging tasks of using microfluidcs based chips with oligonucleotide probes of the present inventions was sealing of the chip after primer and sample placement inside of the chip because of the small reagent volume which evaporates even after one cycle if leaks are present. The inventors demonstrate a leakproof amplification reaction 15(a) with real time monitoring 15(b). In this experiment, the products were diffused throughout the chip with a relatively low SNR.

Presence of the right size of product was confirmed by standard gel electrophoresis 15(c). A key point noted by the inventors was the appearance of the product after the 15^th cycle.

Example 4.

This example describes stability of freeze dried Taq polymerase and optimization of Trehalose concentrations for use in compositions and methods of the present inventions.

For field applications of a microarry (PCR) chip comprising primers and probes of the present inventions, the inventors contemplate chips with primers and reagents already dispensed in them. However, this implies that the primers/polymerase/reagents must be made stable at room temperature or even under hot climates. A common practice to obtain freeze-dried reagents is to add sugar (e.g., Trehalose) at the time of freeze- drying. Optimization of the trehalose concentration and stability of the freeze-dried reagents for long periods (6 to 12 months) are two key aspects. A trehalose concentration of 15% has generally been reported as optimal in literature and confirmed in the inventors lab (Figure 4), although lower concentrations seem to work as well. The reagents were stable for at least one month (FIG.16). Example 5.

This example describes isothermal amplification using a helicase enzyme and primers of the present inventions for use in compositions and methods of the present inventions. Helicase-dependent amplification is isothermal (at around 60 ⁰C) and does not require temperature cycling. The inventors assessed the performance of this enzyme under 21 different conditions that indicated that less than 10 min. was needed for the signals to cross the background threshold. This experiment was conducted at high target concentration (-10,000 copies). Further test are needed to evaluate the detection limit, replication, and primer design. Helicase (BioHelix Corporation, Beverly, Massachusetts, www. biohelix.com/). (FIG.17)

All publications and patents mentioned in the above specification are herein incorporated by reference in their entirety. Various modifications and variations of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in biochemistry, chemistry, molecular biology, microbiology, immunology and water pathogens or related fields are intended to be within the scope of the following claims.

Claims

CLAIMSWhat is claimed is:

1. A method providing target oligonucleotide probes for detecting a target pathogen comprising, a) generating a maximum number of nonoverlapping probes of a predetermined length from a gene sequence derived from a target microorganism, b) comparing the sequence of each nonoverlapping probe to sequences from a plurality of database microorganisms, c) identifying at least one subregion in said gene sequence derived from a target microorganism, wherein said at least one subregion contains exact matches to less than a predetermined number of said database microorganisms; d) generating a set of overlapping oligonucleotide probes of a predetermined length for said at least one subregion.

2. The method of claim 1, wherein said predetermined length of said nonoverlapping oligonucleotide probes is from 10 to 60 bases.

3. The method of claim 1, wherein said subregion is from about 50 to 500 bases in length.

4. The method of claim 1, wherein multiple subregions containing exact matches to less than a predetermined number of database sequences are identified.

5. The method of claim 4, wherein overlapping oligonucleotide probes of a predetermined length are generated for said multiple subregions.

6. The method of claim 1 , further comprising selecting at least one oligonucleotide probe for said at least one subregion.

7. The method of claim 6, wherein said at least one oligonucleotide probe for said at least one subregion has at least one base pair difference as compared to said sequences of said database microorganisms.

8. The method of claim 7, further comprising the step of arrayH^ said at least one probe on a solid surface.

9. A set of nucleic acid probes generated by the method of claim 7.

10. A microarray comprising the set of nucleic acid probes of claim 9.

1 1. A microarray comprising nucleic acids probes for a target organism, wherein said probes are selected by generating a maximum number of nonoverlapping oligonucleotide probes of a predetermined length from a gene sequence derived from said target organism, comparing the sequence of each nonoverlapping oligonucleotide probes to sequences from a plurality of database microorganisms, identifying at least one subregion in said gene sequence derived from a target microorganism, wherein said at least one subregion contains exac.t matches to less than a predetermined number of said database microorganisms; and generating a set of overlapping probes of a predetermined length for said at least one subregion.

12. The microarray of claim 1 1, wherein said oligonucleotide probes are arrayed on a substrate selected from the group consisting of a glass slide and a serpentine chip.

13. The microarray of Claim 1 1, wherein said target organisms are selected from the group consisting of Aeromonas hydrophila, Campylobacter jejuni, Clostridium perfringens, Helicobacter pylori, Legionella pneumophila, Listeria monocytogenes Staphylococcus aureus, Salmonella, Pseudomonas aeruginosa, Vibrio cholerae, Vibrio parahaemolyticus, Yersinia enter ocolitica, E. faecal is, S. enter ica,, C. parvum, G. intestinalis, K. pneumoniae, E. coli. and combinations thereof.

14. The microarray of Claim 13, wherein said oligonucleotide probes are complementary to virulence marker genes from said target organisms.

15. A microarray for high throughput detection of waterborne pathogens comprising sets of multiple discreet nucleic acid probes for at least ten target organisms, wherein said nucleic acid probes are complemetary to multiple amplicons from multiple virulence marker genes from each target organism and multiple discreet nucleic acid probes are provided for each amplicon, and wherein said at least 10 target microorganism are selected from the group consisting of Aeromonas hydrophila, Campylobacter jejuni, Clostridium perfringens, Helicobacter pylori, Legionella pneumophila, Listeria monocytogenes Staphylococcus aureus, Salmonella, Pseudomonas aeruginosa, Vibrio cholerae, Vibrio parahaemolyticus, Yersinia enterocolitica, E. faecalis, S. enterica,, C. parvum, G. intestinalis, K. pneumoniae, and E. coli.

16. The microarray of Claim 15, wherein said nucleic acid probes are selected from the group consisting of oligonucleotide probes encoded by SEQ ID Nos. 1-19206 and combinations thereof.

17. A method for high throughput detection of multiple target organisms a) providing: i) a microarray comprising sets of multiple discreet nucleic acid probes for multiple target organisms, wherein said nucleic acid probes are complemetary to multiple amplicons from multiple target genes of said at least one target organism, and wherein multiple discreet nucleic acid probes are provided for each amplicon; ii) primers complementary to multiple target genes of said multiple target organisms; and b) using said primers to amplify said multiple target genes from said multiple target organisms to produce amplicons; c) contacting said microarray with said amplicons under conditions such that said amplicons hybridize to said discreet nucleic acid probes; and d) detecting the presence of amplicon binding to said discreet nucleic acid probes.

18. The method of Claim 19, wherein said target organisms are selected from the group consisting of Aeromonas hydrophila, Campylobacter jejuni, Clostridium perfringens, Helicobacter pylori, Legionella pneumophila, Listeria monocytogenes Staphylococcus aureus, Salmonella, Pseudomonas aeruginosa, Vibrio cholerae, Vibrio parahaemolyticus, Yersinia enter ocolitica, E. faecalis, S. entehca,, C. parvum, G. intestinalis, K. pneumoniae, E. coli. and combinations thereof.

19. The method of Claim 19, wherein said probes are complementary to virulence marker genes from said target organisms.

20. The method of Claim 19, wherein said nucleic acid probes are selected from the group consisting of probes encoded by SEQ ID Nos. 1-19206 and combinations thereof.

21. A set of oligonucleotide probes consisting of at least 2 sequences selected from the group consisting of SEQ ID Nos. 1 - 19206.

22. A microarry consisting of at least 2 sequences selected from the group consisting of SEQ ID Nos. 1-19206.

23. The microarray of claim 22, wherein said microarray is a serpentine chip.