HK1181819A

HK1181819A - Method for rapid detection and identification of bioagents

Info

Publication number: HK1181819A
Application number: HK13109171.7A
Authority: HK
Inventors: J. Ecker David; H. Griffey Richard; Sampath Rangarajan; Hofstadler Steven; Mcneil John
Original assignee: Ibis Biosciences, Inc.
Priority date: 2001-03-02
Filing date: 2013-08-06
Publication date: 2013-11-15

Description

Method for rapid detection and identification of biological objects

The application is a divisional application of an invention patent application with the international application number of 02809122.1, namely, the international application PCT/US2002/006763 with the date of 2002, 3, month and 4, entering China, and the subject of the invention is a method for quickly detecting and identifying biological objects.

Statement of government support

The invention is supported by the U.S. government DARPA/SPO contract BAA 00-09. The united states government may have certain rights in the invention.

Technical Field

The present invention relates to methods for rapid detection and identification of organisms in environmental, clinical or other samples. The method provides for the detection and characterization of unique Base Combination Signatures (BCS) of any organism, including bacteria and viruses. This unique BCS can be used to rapidly identify the organism.

Background

Rapid and unequivocal identification of microorganisms is desirable for a variety of industrial, medical, environmental, quality, and research reasons. Traditionally, the function of microbiological laboratories is to identify the causative agent of an infectious disease by direct examination and culture of samples. Researchers have repeatedly demonstrated the practical utility of molecular biology techniques since the mid-1980 s, and many of these techniques form the basis of clinical diagnostic testing. Some of these techniques include nucleic acid hybridization analysis, restriction enzyme analysis, genetic sequence analysis, and isolation and purification of nucleic acids (see, e.g., J.Sambrook, E.F.Fritsch and T.Maniatis, molecular cloning: A laboratory Manual, second edition, Cold spring harbor laboratory Press, Cold spring harbor, New York, 1989). These methods are generally time consuming and tedious. Another option is the Polymerase Chain Reaction (PCR), or other amplification method, which amplifies a specific target DNA sequence depending on the flanking primers used. Finally, detection and data analysis can convert the hybridization into an analytical result.

Other techniques for detecting organisms include high resolution Mass Spectrometry (MS), low resolution MS, fluorescence, radioiodination, DNA chips, and antibody techniques. None of these techniques is entirely satisfactory.

Mass spectrometry provides detailed information about the molecules being analyzed, including high mass accuracy. It is also a method that can be easily automated. However, high resolution MS alone cannot be used to analyze unknown or bioengineered organisms, or in environments where high background levels of organisms are present (a "cluttered" background). Low resolution MS cannot detect some known organisms if their spectral lines are weak enough or close enough to those of other living organisms in the sample. DNA chips with specific probes are only capable of determining the presence or absence of a particular predicted organism. Because there are hundreds of thousands of benign bacteria, some of which are very similar in sequence to the wiener bacteria, even a chip array of 10,000 probes lacks the breadth necessary to detect a particular organism.

Antibodies face more severe diversity limitations than arrays. If antibodies are designed against highly conserved targets to increase diversity, the false alarm problem can be a major problem, again because dangerous organisms are very similar to benign organisms. Antibodies can only detect known organisms in a relatively uncluttered environment.

Several groups reported detection of PCR products using high resolution electrospray ionization-Fourier transform-ion cyclotron resonance mass spectrometry (ESI-FT-ICR MS). Accurate determination of the exact mass, combined with knowledge of the amount of at least one nucleoside, allows calculation of the total base composition of the PCR duplex product of approximately 100 base pairs. (AaSerud et al, J.Am.Soc.Mass. Spec.7: 1266. 1269, 1966; Muddiman et al, anal.chem.69: 1543. 1549, 1997; Wunschel et al, anal.chem.70: 1203. 1207, 1998; Muddiman et al, Rev.anal.chem.17: 1-68, 1998). Electrospray ionization-Fourier transform-ion cyclotron resonance (ESI-FT-ICR) MS can determine the mass of 500 base pair duplexes, PCR products, by averaging the molecular masses (Hurst et al, Rapid Commun. Mass. Spec. 10: 377-382, 1996). The use of matrix-assisted laser desorption ionization time of flight (MALDI-TOF) mass spectrometry to characterize PCR products has been reported. (Muddiman et al, Rapid Commun. Mass. Spec. 13: 1201-1204, 1999). However, the use of MALDI to observe only more than 75 nucleotides of DNA degradation limits the utility of this method.

U.S. Pat. No. 5,849,492 describes a method for recovering phylogenetic information DNA sequences comprising searching for highly divergent sections of genomic DNA surrounded by two highly conserved segments, designing a universal primer for PCR amplification of the highly divergent regions, amplifying the genomic DNA by PCR using the universal primer, and determining the identity of the organism by determining the gene sequence.

U.S. Pat. No. 5,965,363 discloses methods for screening for nucleic acid polymorphisms by analyzing amplified target nucleic acids using mass spectrometry techniques, and methods for improving the mass resolution and mass accuracy of these methods.

WO 99/14375 describes methods, PCR primers and kits for use in mass spectrometric analysis of preselected DNA tandem nucleotide repeat alleles.

WO 98/12355 discloses a method for determining the mass of a target nucleic acid by mass spectrometry by cleaving the target nucleic acid to reduce its length, generating a target single strand and determining the mass of the single-stranded shortened target using MS. Also disclosed are methods of preparing a double-stranded target nucleic acid for MS analysis, comprising amplifying the target nucleic acid, binding one strand to a solid support, releasing the second strand, and subsequently releasing the first strand, followed by analysis by MS. Kits for preparing target nucleic acids are also provided.

PCT WO97/33000 discloses a method for detecting mutations in a target nucleic acid by nonrandom fragmentation of the target into a set of single-stranded nonrandom length fragments and measuring their mass by MS.

U.S. Pat. No. 5,605,798 reports a rapid and highly accurate mass spectrometer-based method for detecting the presence of a specific nucleic acid in a biological sample for diagnostic purposes.

WO 98/21066 describes a method for determining the sequence of a particular target nucleic acid by mass spectrometry. Methods for detecting the presence of a target nucleic acid in a biological sample using PCR amplification and mass spectrometric detection are disclosed, as are methods for detecting a target nucleic acid by amplifying a target in a sample using primers containing a restriction enzyme site and a label, extending and cleaving the amplified nucleic acid, and detecting the presence of an extended product, wherein the presence of a DNA fragment having a mass different from that of the wild type indicates a mutation. Methods for determining nucleic acid sequences by mass spectrometry have also been reported.

WO 97/37041, WO 99/31278 and U.S. Pat. No. 5,547,835 describe methods for determining nucleic acid sequences by mass spectrometry. U.S. Pat. nos. 5,622,824, 5,872,003, and 5,691,141 describe methods, systems, and kits for exonuclease-mediated mass spectrometry sequencing.

Thus, there is a need for a specific and rapid method of organism detection and identification, wherein sequencing of nucleic acids is not required. The present invention meets this need.

Disclosure of Invention

One embodiment of the invention is a method of identifying an unknown organism comprising (a) contacting the nucleic acid of the organism with at least one pair of oligonucleotide primers that hybridize to the nucleic acid sequence and flank a variable nucleic acid sequence; (b) amplifying the variable nucleic acid sequence to produce an amplification product; (c) determining the molecular weight of the amplification product; (d) comparing the molecular weight to the molecular weight of one or more amplification products obtained by performing steps (a) - (c) on a plurality of known organisms, wherein a match identifies the unknown organism. In one aspect of this preferred embodiment, the sequences capable of hybridizing to at least one pair of oligonucleotide primers are highly conserved. Preferably the amplification step comprises polymerase chain reaction. Alternatively, the amplification step comprises ligase chain reaction or strand displacement amplification. In one aspect of this preferred embodiment, the organism is a bacterium, virus, cell or spore. Advantageously, the nucleic acid is ribosomal RNA. In another aspect, the nucleic acid encodes RNase P or an RNA-dependent RNA polymerase. Preferably, the amplification product is ionized prior to molecular weight determination. The method may further comprise the step of isolating the nucleic acid from the organism prior to contacting the nucleic acid with the at least one pair of oligonucleotide primers. The method may further comprise performing steps (a) - (d) using a pair of different oligonucleotide primers and comparing the results to the molecular weight of one or more amplification products obtained from performing steps (a) - (c) on a different plurality of known organisms of step d). Preferably, one or more molecular weights are contained in a database of molecular weights. In another aspect of this preferred embodiment, the amplification product is ionized by electrospray ionization, matrix-assisted laser desorption, or fast atom bombardment. Advantageously, the molecular weight is determined by mass spectrometry. Preferably the mass spectrum is Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), ion trap, quadrupole, magnetic sector, time of flight (TOF), Q-TOF or triple quadrupole. The method may further comprise performing step (b) in the presence of an analog of adenine, thymidine, guanosine or cytidine having a different molecular weight than adenosine, thymidine, guanosine or cytidine. In one aspect, the oligonucleotide primer comprises a base analog or substituted base at positions 1 and 2 of each triplet within the primer, wherein the base analog or substituted base binds with greater affinity to its complement than the native base. Preferably, the primer contains a universal base at position 3 of each triplet within the primer. The base analog or alternative base may be 2, 6-diaminopurine, propyne T, propyne G, phenazine, or a G-clamp. Preferably, the universal base is inosine, guanidine, uridine, 5-nitroindole, 3-nitropyrrole, dP or dK or 1- (2-deoxy-. beta. -D-ribofuranosyl) -imidazole-4-carboxamide.

Another embodiment of the invention is a method of identifying an unknown organism, comprising (a) contacting a nucleic acid of the organism with at least one pair of oligonucleotide primers that hybridize to the nucleic acid sequence and flank a variable nucleic acid sequence; (b) amplifying the variable nucleic acid sequence to produce an amplification product; (c) determining the base composition of the amplification product; (d) comparing the base composition to one or more base compositions of amplification products obtained by performing steps (a) - (c) on a plurality of known organisms, wherein a match identifies the unknown organism. In one aspect of this preferred embodiment, the sequences capable of hybridizing to at least one pair of oligonucleotide primers are highly conserved. Preferably the amplification step comprises polymerase chain reaction. Alternatively, the amplification step comprises ligase chain reaction or strand displacement amplification. In one aspect of this preferred embodiment, the organism is a bacterium, virus, cell or spore. Advantageously, the nucleic acid is ribosomal RNA. In another aspect, the nucleic acid encodes RNase P or an RNA-dependent RNA polymerase. Preferably, the amplification product is ionized prior to molecular weight determination. The method may further comprise the step of isolating the nucleic acid from the organism prior to contacting the nucleic acid with the at least one pair of oligonucleotide primers. The method may further comprise performing steps (a) - (d) using a pair of different oligonucleotide primers and comparing the results to the base composition of one or more amplification products obtained from performing steps (a) - (c) on a different plurality of known organisms of step d). Preferably, the one or more base compositions are contained in a base composition database. In another aspect of this preferred embodiment, the amplification product is ionized by electrospray ionization, matrix-assisted laser desorption, or fast atom bombardment. Advantageously, the molecular weight is determined by mass spectrometry. Preferably the mass spectrum is Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), ion trap, quadrupole, magnetic sector, time of flight (TOF), Q-TOF or triple quadrupole. The method may further comprise performing step (b) in the presence of an analog of adenine, thymidine, guanosine or cytidine having a different molecular weight than adenosine, thymidine, guanosine or cytidine. In one aspect, the oligonucleotide primer comprises a base analog or substituted base at positions 1 and 2 of each triplet within the primer, wherein the base analog or substituted base binds with greater affinity to its complement than the native base. Preferably, the primer contains a universal base at position 3 of each triplet within the primer. The base analog or alternative base may be 2, 6-diaminopurine, propyne T, propyne G, phenazine, or a G-clamp. Preferably, the universal base is inosine, guanidine, uridine, 5-nitroindole, 3-nitropyrrole, dP or dK or 1- (2-deoxy-. beta. -D-ribofuranosyl) -imidazole-4-carboxamide.

The present invention also provides a method for detecting a single nucleotide polymorphism in an individual, the method comprising the steps of (a) isolating nucleic acid from the individual; (b) contacting the nucleic acid with an oligonucleotide primer that hybridizes to the nucleic acid flanking a region containing a potential polymorphism; (c) amplifying the region to produce an amplification product; (d) determining the molecular weight of the amplification product; (e) comparing the molecular weight to the molecular weight of a region of the individual known to have the polymorphism, wherein if the two molecular weights are the same, the individual has the polymorphism.

In one aspect of this preferred embodiment, the primer is hybridized to a highly conserved sequence. Preferably, the polymorphism is associated with a disease. Alternatively, the polymorphism is a blood group antigen. In one aspect of this preferred embodiment, the amplification step is a polymerase chain reaction. Alternatively, the amplification step is ligase chain reaction or strand displacement amplification. Preferably, the amplification product is ionized prior to mass measurement. In one aspect, the amplification product is ionized by electrospray ionization, matrix-assisted laser desorption, or fast atom bombardment. Advantageously, the molecular weight is determined by mass spectrometry. Preferably the mass spectrometry is Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), ion trap, quadrupole, magnetic sector, time of flight (TOF), Q-TOF or triple quadrupole.

Drawings

FIGS. 1A-1I show consensus diagrams of examples of conserved regions from 16S rRNA (FIGS. 1A-1B) (SEQ ID NO: 3), 23S rRNA (3 '-half, FIGS. 1C-1D; 5' -half, FIGS. 1E-F) (SEQ ID NO: 4), 23S rRNA domain I (FIG. 1G), 23S rRNA domain IV (FIG. 1H) and 16S rRNA domain III (FIG. 1I) that are suitable for use in the present invention. The arrowed lines are examples of regions targeted by the primer pairs designed for PCR. The label for each primer pair represents the number of the starting and ending bases of the amplified region on the consensus diagram. The bases of capital letters are conserved by more than 95%; the bases of lower case letters are 90-95% conserved; solid circles are 80-90% conserved; open circles are less than 80% conserved. The label for each primer pair represents the number of the starting and ending bases of the amplified region on the consensus diagram.

FIG. 2 shows a typical primer amplification region of 16S rRNA domain III shown in FIG. 1C.

FIG. 3 shows a schematic of the conserved regions in RNase P. The bases of capital letters are more than 90% conserved; the bases of lower case letters are 80-90% conserved; bases represented by filled circles are 70-80% conserved; bases represented by open circles are less than 70% conserved.

FIG. 4 is a schematic illustration of the determination of base composition signatures using nucleoside analog "tags" to determine the base composition signatures.

FIG. 5 shows a deconvoluted mass spectrum of Bacillus anthracis (Bacillus anthracis) with or without the A region of the mass marker phosphorothioate. The two mass spectra differ in that the measured molecular weight of the sequence containing the mass label is greater than that of the unmodified sequence.

Fig. 6 shows the Base Composition Signature (BCS) mass spectra of PCR products from staphylococcus aureus (s.aureus 16S _1337F) and bacillus anthracis (b.anthr.1696 _1337F), using the same primers for amplification. The two chains differ only by two (AT → CG) substitutions and are clearly identifiable in terms of their BCS.

FIG. 7 shows the two sequences (A of Bacillus anthracis)₁₄For wax sampleBacillus A₁₅) One difference between these can be easily detected by ESI-TOF mass spectrometry.

FIG. 8 is ESI-TOF of Bacillus anthracis envelope protein sspE 56 mer plus calibration. This signal unambiguously identifies other bacilli and anthrax.

FIG. 9 is an ESI-TOF of a 16S _1228 duplex (reverse and forward strand) synthesized by B.anthracis. This technique allows easy discrimination between the forward and reverse strands.

FIG. 10 is an ESI-FTICR-MS of a synthetic Bacillus anthracis 16S-133746 base pair duplex.

FIG. 11 is an ESI-TOF-MS of a 56 mer oligonucleotide (3 scans) of the B.anthracis saspB gene using internal mass standards. This internal quality criterion is indicated by an asterisk.

FIG. 12 is ESI-TOF-MS of internal standard formulated with 5mM TBA-TFA buffer, showing that the charged band with tributylammonium trifluoroacetate reduces the charged state of maximum abundance from [ M-8H⁺]8^-Down to [ M-3H⁺]3^-。

Detailed Description

The present invention provides a non-PCR biomass detection mode, preferably high resolution MS, in combination with PCR-based BCS technology using "smart primers" that hybridize to conserved sequence regions of nucleic acids derived from an organism and contain variable sequence regions that uniquely identify the organism. High resolution MS technology is used to determine the molecular weight and Base Composition Signature (BCS) of the region of the sequence being amplified. This unique "base composition signature" (BCS) is then entered into a maximum likelihood detection algorithm for matching a database of base composition signatures in the same amplified region. The method combines PCR-based amplification techniques (providing specificity) with a molecular weight detection mode (providing speed and no need to determine the nucleic acid sequence of the amplified target sequence) for organism detection and identification.

The method enables detection and identification of organisms very rapidly and accurately compared to existing methods. Furthermore, such rapid detection and identification is possible even when the sample substance is not pure. Thus, the method is useful in a variety of fields, including, but not limited to, environmental testing (e.g., to detect and distinguish between pathogenic and non-pathogenic bacteria in water or other samples), bacterial warfare (to immediately identify organisms and appropriate treatments), pharmacogenetic analysis, and medical diagnostics (including cancer diagnosis based on mutations and polymorphisms; drug resistance and susceptibility testing; screening and/or diagnosing genetic diseases and conditions; diagnosing infectious diseases and conditions). This approach leverages ongoing biomedical research on toxicity, pathogenicity, drug resistance, and genome sequencing, providing a much improved sensitivity, specificity, and reliability over existing methods, with a lower false positive rate.

The method can be used to detect and classify any organism, including bacteria, viruses, fungi and toxins. As an example, when the organism is a biological threat, the resulting information can be used to determine practical information needed for countermeasures, including toxin genes, pathogen isolation, and antibiotic resistance genes. In addition, the method can be used to identify natural or carefully considered genetically engineered events, including chromosome fragment swapping, molecular breeding (gene shuffling), and emerging infectious diseases.

Bacteria have a common set of absolutely required genes. About 250 genes are present in all kinds of bacteria (Proc. Natl. Acad. Sci. U.S.A.93: 10268, 1996; Science 270: 397, 1995), including minigenomes, such as Mycoplasma (Mycoplasma), Ureaplasma (Ureapasma) and Rickettsia (Rickettsia). These genes encode proteins involved in translation, replication, recombination and repair, transcription, nucleotide metabolism, amino acid metabolism, lipid metabolism, energy production, absorption, secretion, and the like. Examples of such proteins are DNA polymerase III β, elongation factor TU, heat shock protein groEL, RNA polymerase B, phosphoglycerate kinase, NADH dehydrogenase, DNA ligase, DNA topoisomerase and elongation factor G. Operons can also be targeted using the present methods. An example of an operon is the bfp operon of enteropathogenic E.coli. Multiple core chromosomal genes can be used to classify bacteria at the genus or species level to determine whether an organism has a potential threat. This method can also be used to detect pathogenic markers (plasmids or chromosomes) and antibiotic resistance genes to confirm the potential threat of an organism and to guide countermeasures.

A theoretically ideal organism detector would be able to identify, quantify and report the entire nucleic acid sequence of each organism that reached the sensor. The complete sequence of the nucleic acid component of the pathogen should provide all relevant information about the threat, including its identity and the presence of drug resistance or pathogenic markers. This ideal has not been achieved. However, the present invention provides a simple strategy to obtain information with the same actual value using Base Composition Signatures (BCS). Although the base composition of a gene fragment is not as informative as the sequence itself, if a short analyte sequence fragment is appropriately selected, it is not necessary to analyze the entire sequence of the gene. A database of reference sequences can be prepared in which each sequence is indexed to a unique base composition signature so that the presence of a sequence can be accurately inferred from the presence of its signature. The advantage of base composition signatures is that they can be quantitatively determined in a massively parallel fashion using multiplex PCR (in which two or more primer pairs amplify the target sequence simultaneously) and mass spectrometry. These multiple primer amplified regions uniquely identify the most virulent and prevalent background bacteria and viruses. In addition, cluster-specific primer pairs can resolve local clusters of interest (e.g., anthrax clusters).

In the context of the present invention, an "organism" is any organism, living or dead, or a nucleic acid derived from such an organism. Examples of organisms include, but are not limited to, cells (including, but not limited to, human clinical samples, bacterial cells, and other pathogens), viruses, toxin genes, and bioregulatory compounds). The sample may be live or dead or in a growing state (e.g., a vegetative bacteria or spores) and may be encapsulated or bioengineered.

As used herein, a "base composition signature" (BCS) is the precise base composition of a selected fragment of a nucleic acid sequence that uniquely identifies a target gene and the source organism. BCS can be considered a unique index of a particular gene.

As used herein, a "smart primer" is a primer that binds to a region of sequence flanked by a monovalent import variable region. In a preferred embodiment, these sequence regions flanking the variable region are highly conserved among different species of organisms. For example, this sequence region may be highly conserved among all species of Bacillus. The term "highly conserved", means that these sequence regions exhibit about 80-100% identity, more preferably about 90-100%, most preferably about 95-100% identity to amplify 16S and 23S rRNA regions as an example of a smart primer with identical performance as shown in FIG. 2. Arrows represent primers that bind to highly conserved regions flanked by 16S rRNA domain III variable regions. The amplified region was the stem-loop structure under "1100-1188".

As used herein, "match" refers to any statistically significant probabilistic determination or result.

One major advantage of the detection method of the present invention is that no specific primers for specific bacterial species or genera are required, such as Bacillus (Bacillus) or Streptomyces (Streptomyces). This primer is able to recognize highly conserved regions across hundreds of bacterial species, including, but not limited to, the species described herein. Thus, the same primer pair can be used to identify any desired bacterium because it binds to a conserved region flanked by a variable region that is specific for one species or common to several species, thereby enabling nucleic acid amplification of the intervening sequence and determination of its molecular weight and base composition. For example, the 16S _971-, 1062, 16S _1228-, 1310-and 16S _ 1100-1188-regions are 98-99% conserved in about 900 bacteria (16S ═ 16S rRNA, numbers indicate nucleotide positions). In one embodiment of the invention, the primers used in the method are capable of binding to one or more of these regions or parts thereof.

The present invention provides a non-PCR biomass detection mode, preferably high resolution MS, in combination with nucleic acid amplification based BCS technology using "intelligent primers" that hybridize to conserved regions and contain variable regions that uniquely identify the organism. Although the use of PCR is preferred, other nucleic acid amplification techniques may be used, including Ligase Chain Reaction (LCR) and Strand Displacement Amplification (SDA). High resolution MS techniques can separate lines of organisms from background spectral lines in highly confounded environments. The resolved spectral lines are then translated into a BCS, which is input to a maximum likelihood detection algorithm to match one or more known BCS spectral lines. Preferably, the organism's BCS profile is matched to one or more of a database of BCSs for a vast number of organisms. Preferably, a maximum likelihood detection algorithm is used for such matching.

In a preferred embodiment, the base composition signature is quantitatively determined in a massively parallel manner using Polymerase Chain Reaction (PCR), preferably multiplex PGR and Mass Spectrometry (MS) methods. Sufficient amounts of nucleic acid must be present for MS detection of the organism. A variety of techniques for preparing large quantities of purified nucleic acids or fragments thereof are well known to those skilled in the art. PCR requires one or more pairs of oligonucleotide primers that bind to regions flanking the target sequence to be amplified. These primers prime the synthesis of different strands of DNA, with synthesis occurring with one primer oriented towards the other. DNA synthesis is initiated by mixing the primers, DNA to be amplified, a thermostable DNA polymerase (e.g., Taq polymerase), the four deoxynucleoside triphosphates and a buffer. This solution was denatured by heating and then cooled to anneal the newly added primers, followed by another round of DNA synthesis. This process is typically repeated for about 30 cycles, resulting in amplification of the target sequence.

The "intelligent primers" define the target sequence to be amplified and analyzed. In one embodiment, the target sequence is a ribosomal rna (rrna) gene sequence. The complete sequence of many of the smallest microbial genomes is now available, and it is possible to identify a set of genes defined as "minimal life" and to identify a compositional signature that uniquely distinguishes each gene and organism. Genes encoding core vital functions such as DNA replication, transcription, ribosomal structure, translation and transport are widely distributed in the bacterial genome and are preferred regions for BCS analysis. Ribosomal RNA (rRNA) Gene containingThere is a region that provides a useful base composition signature. Like many genes involved in core life functions, rRNA genes contain very conserved sequences across bacterial domains interspersed with highly distinct regions that are more specific to each species. These different regions can be used to construct a database of base composition signatures. Strategies include establishing a structure-based alignment of the sequences of the small (16S) and large (23S) subunits of the rRNA gene. For example, over 13,000 sequences are currently created and maintained in ribosomal RNA databases by Robin Gutel, University of Texas ataAustin, and are publicly available on the Institute for Cellular and molecular biology web pages (www.rna.icmb.utexas.edu /). The public also has access to the web pages by the university of Antwerp, Belgium ((R))www.rrna.uia.ac.be) rRNA databases created and maintained.

These data have been analyzed to determine the region that serves as a base composition signature. These regions are characterized by: a) in a particular biological species of interest, the identity of the upstream and downstream nucleotide sequences as positions of sequence amplification primers is between 80-100%, preferably > 95%; b) the intervening variant I exhibit no more than about 5% identity among the species; c) conserved regions are separated by about 30 to 1000 nucleotides, preferably no more than 50-250 nucleotides, and more preferably no more than about 60-100 nucleotides.

Because they are generally conserved, the flanking rRNA primer sequences serve as excellent "universal" primer binding sites to amplify most, if not all, regions of interest for the species. This intervening region differs in length and/or composition between sets of primers, thus providing a unique base composition signature.

The advantage of designing such "intelligent primers" is versatility, making it possible to minimize the number of primers to be synthesized, and to use a pair of primers to detect many biological species. These primer pairs can be used to amplify different regions in these species. Because any change in these conserved regions in a species (due to instability at the third position of the codon) may occur at the third position of a DNA triplet, oligonucleotide primers may be designed such that the nucleotide corresponding to this position is a base that can bind more than one nucleotide, referred to herein as a "universal base". For example, in such "labile" pairs, inosine (I) binds U, C or a; guanine (G) can bind to U or C, and uridine (U) can bind to U or C. Examples of other universal bases include nitroindoles such as 5-nitroindole or 3-nitropyrrole (Loakes et al, Nucleotides and Nucleotides 14: 1001-1003, 1995), degenerate Nucleosides dP or dK (Hill et al), acyclic nucleoside analogs containing 5-nitroindazole (Van Aeschot et al, Nucleotides and Nucleotides 14: 1053-1056, 1995) or the purine analogs 1- (2-deoxy-. beta. -D-ribofuranosyl) -imidazole-4-amide (Sala et al, nucleic acids. Res.24: 3302-3306, 1996).

In another embodiment of the invention, to compensate for the weaker binding of the "labile" base, oligonucleotide primers are designed such that the first and second positions of each triplet are occupied by nucleotide analogs with higher affinity than the unmodified nucleotide. Examples of such analogues are 2, 6-diaminopurine capable of binding thymine, propyne T and propyne C capable of binding adenine and phenazines, including G-clips capable of binding G. Propyne is described in U.S. patents 5645958, 5830653 and 5494908, the entire contents of which are incorporated herein by reference. PhenOxazines are described in patents 5,502,177, 5,763,588, and 6,005,096, the entire contents of which are incorporated herein by reference. G-clips are described in U.S. patent nos. 6,00,992 and 6,028,183, the entire contents of which are incorporated herein by reference.

Bacterial biowarfare agents that can be detected by existing methods include Bacillus anthracis (anthrax), Yersinia pestis (plague pneumonia), Francisella tularensis (tularemia), Brucella suis, Brucella abortus, Brucella melitensis (wavy fever), Burkholderia melini (melioidis), Burkholderia pseudomallei (melioidis), Salmonella typhi (typhoid), Rickettsia typhi (epidemic typhus typhoid), Rickettsia pustulata (endemic typhus) and Burnatt Rickettsia (Q fever), Rhodobacter capsulatus, Chlamydia pneumoniae, Escherichia coli, Shigella shigella flexneri, Bacillus cereus, Clostridium botulinum, Burnathi Rickettsia, Pseudomonas aeruginosa, Legionella pneumophila, and Vibrio cholerae.

In addition to 16S and 23S rRNA, other bacterial target regions suitable for detection according to the invention include 5S rRNA and RNase P (FIG. 3).

Biowarfare fungal biowarfare agents include coccidioidomycosis immitis.

Biological warfare toxin genes that can be detected by the method of the invention include botulinum exotoxin, T-2 mycotoxin, ricin, staphylococcal enterotoxin B, Shiga toxin, abrin, aflatoxin, Clostridium perfringens epsilon toxin, conotoxin, snakehead toxin, tetrodotoxin and saxitoxin.

Biological warfare virus threat agents are mainly RNA viruses (positive and negative strand), with the exception of smallpox. Each RNA virus is a family of related viruses (quasispecies). These viruses mutate rapidly and have a high potential (natural or carefully considered) as engineered strains. RNA viruses cluster into families with conserved RNA domains (e.g., virion components, accessory proteins) and conserved housekeeping genes encoding core viral proteins on the viral genome, including single-stranded positive-strand RNA viruses, RNA-dependent RNA polymerases, double-stranded RNA helicases, chymotrypsin-like and papain-like proteases, and transmethylases.

Examples of (-) -strand RNA viruses include arenaviruses (e.g., sabia virus, lassa fever, Machupo, argentine hemorrhagic fever, flexal virus), Muya viruses (e.g., Hantavirus, nairob virus, phlebovirus, hantavirus, Congo-Crimeria peninsula hemorrhagic fever, rift valley fever), and mononegarales (e.g., filovirus, paramyxovirus, Ebola virus, Marburg virus, equine measles virus).

Examples of (+) -strand RNA viruses include picornaviruses (e.g., Coxsackie virus, echovirus, human coxsackievirus A, human echovirus, human enterovirus, human poliovirus, hepatitis A virus, human parechovirus, human rhinovirus), astrovirus (e.g., human astrovirus), calciviruses (e.g., chiba virus, chitta virus, human calcivirus, Norwalk virus), nidovirales (e.g., human coronavirus, human torovirus), flaviviruses (e.g., dengue virus 1-4, Japanese encephalitis virus, forest hemorrhagic fever virus, Murray Valley encephalitis virus, Rocio virus, St.Louis encephalitis virus, West Nile virus, yellow fever virus, hepatitis C virus), and togaviruses (e.g., bucking virus, eastern equine encephalitis virus, Mayaro virus, O 'nyong-nyong virus, Venus veneris, Rough equine encephalitis virus, equine encephalitis, hepatitis C virus), and togaviruses (e.g., Flex virus, eastern equine encephalitis virus, Mayaro virus, O' nyong virus, Rheuviruses, Rought virus, Japanese encephalitis virus, Rheuviruses, Japanese encephalitis virus, Hepatitis e virus). Hepatitis C virus has a 340 nucleotide 5 '-untranslated region, and the open reading frame encoding 9 proteins has 3010 amino acids and a 240 nucleotide 3' -untranslated region. The 5 '-UTR and 3' -UTR are 99% conserved in hepatitis C virus.

In one embodiment, the target gene is an RNA-dependent RNA polymerase or a helicase encoded by a (+) -stranded RNA virus, or an RNA polymerase of a (-) -stranded RNA virus. (+) -stranded RNA viruses are double-stranded RNA and can replicate by RNA-directed RNA synthesis using an RNA-dependent RNA polymerase and the positive strand as a template. Helicases unwind the RNA duplex allowing replication of the single-stranded RNA. These include viruses from the picornavirus family (e.g., poliovirus, coxsackievirus, echovirus), togaviruses (e.g., alphavirus, flavivirus, rubella virus), arenaviruses (e.g., lymphocytic choriomeningitis virus, lassa fever virus), cononavir (e.g., human respiratory virus), and hepatitis a virus. The genes encoding these proteins comprise a variable region and highly conserved regions flanking the variable region.

In a preferred embodiment, the detection scheme for the production of PCR products from an organism has three features. First, the technique can simultaneously detect and distinguish between multiple (typically about 6-10) PCR products. Second, the technique provides a BCS that uniquely identifies the organism from the possible primer positions. Finally, the detection technique is rapid, allowing multiple PCR reactions to be performed in parallel.

In one embodiment, the method can be used to detect antibiotic resistance and/or the presence of a toxin gene in a bacterial species. For example, Bacillus anthracis containing a tetracycline resistance plasmid and a plasmid encoding one or two anthrax toxins (px01 and/or px02) can be detected using an anti-biotic primer pair and a toxin gene primer pair. If the Bacillus anthracis is positive for tetracycline resistance, a different antibiotic, such as quinalone, can be used.

Mass Spectrometry (MS) -based detection of PCR products offers all of these features and additional advantages. MS is itself a parallel detection scheme, without the need for radioactive or fluorescent labels, since each amplification product with a unique base composition can be identified by its molecular weight. The current state of the art in mass spectrometry is that less than femtomolar amounts of species can be readily analyzed to provide information about the molecular composition of a sample. Accurate determination of the molecular weight of a substance can be obtained quickly, whether the sample molecular weight is in the hundreds or more than one hundred thousand atomic mass units (amu) or daltons. Intact molecular ions can be generated from the amplification product using one of a variety of ionization techniques that convert the sample to a gas phase. These ionization techniques include, but are not limited to, electrospray ionization (ES), matrix-assisted laser desorption ionization (MALDI), and Fast Atom Bombardment (FAB). For example, MALDI of nucleic acids, together with examples of matrices for use in MALDI of nucleic acids, is described in WO 98/54751(Genetrace, Inc.).

Upon ionization, several peaks can be observed from one sample due to the formation of differently charged ions. Averaging multiple molecular weight readings obtained in a single mass spectrum provides an estimate of the biological quantum. Electrospray ionization mass spectrometry (ESI-MS) is particularly useful for very high molecular weight polymers, such as proteins and nucleic acids with molecular weights in excess of 10kDa, because it produces a distribution of multiple charged molecules of the sample without causing significant fragmentation.

Mass detectors useful in the methods of the invention include, but are not limited to, Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), ion traps, quadrupoles, magnetic sectors, time of flight (TOF), Q-TOF, or triple quadrupoles.

Mass spectrometry techniques generally useful in the present invention include, but are not limited to, tandem mass spectrometry, infrared multiphoton dissociation, and pyrolytic gas chromatography mass spectrometry (PGC-MS). In an embodiment of the invention, the organism detection system is operated continuously in an organism detection mode using pyrolytic GC-MS without PCR to rapidly detect an increase in biomass (e.g., an increase in fecal contamination of drinking water or bacterial warfare media). To minimize latency, the sample stream was continuously flowed directly into the combustion chamber of the PGC-MS. When an increase in biomass is detected, the PCR process is automatically initiated. The presence of the organism caused an increase in the levels of the 100-Da large molecular fragments, which were observed in the PGC-MS mass spectra. Comparing the seen mass spectrum to a threshold level, the organism classification process described herein above (combining PCR and MS, preferably FT-ICR-MS) is initiated when the measured biomass level exceeds a predetermined threshold. Optionally such detected biomass levels may also initiate alarms and other processes (stopping ventilation air flow, physical separation).

Accurate determination of molecular weight of Large DNAs by cationic addition of PCR reaction on Each Strand and abundance in the Natural world¹³C and¹⁵resolution of isotopic peaks of the N isotope and limitation of the arrangement of ion charge states. The cations can be removed by on-line dialysis through the chip, and the solution containing the PCR product is contacted with a solution containing ammonium acetate under an electric field gradient orthogonal to the flow of the liquid. The latter two problems can be solved by manipulating and adding isotope-depleted nucleoside triphosphates to DNA with a resolving power > 10,000. The resolving power of the instrument is also a consideration. Resolution of 10,000, [ M-14H + of 84-mer PCR product]^14-The mode signal characteristic of the charged state is weak and the arrangement or exact mass determination of this charged state is not possible. At a resolving power of 33,000, peaks of each isotope component were visible. At a resolving power of 100,000, isotopic peaks relative to the baseline can be resolved and the arrangement of the ionic charge states is straightforward. To obtain¹³C，¹⁵N]Depleted triphosphates, e.g. nucleoside can be collected by growing culture on depleted medium (Batey et al, nucleic acidses.20：4515-4523，1992)。

Mass Spectrometry (MS) although mass measurement of the entire nucleic acid region is considered sufficient to determine most biological clustersⁿ) Techniques may provide more specific information about molecular identity or sequence. Tandem MS involves the use of two or more stages of mass analysis in combination when both the separation and detection steps are based on mass spectrometry. The first stage is used to select ions or components of the sample from which further structural information is obtained. Selected ions are then fragmented using, for example, black body radiation, infrared multiphoton dissociation, or collisional activation. For example, ions generated by electrospray ionization (ESI) can be fragmented using IR multiphoton dissociation. Activation results in the separation of the glycosidic bond from the phosphate backbone, producing two series of fragment ions, termed the ω -series (containing the intact 3 'terminus and 5' phosphate after internal cleavage) and the α -base series (containing the intact 5 'terminus and 3' furan).

The second stage of mass analysis is then used to detect and determine the mass of the resulting fragments of these product ions. This post-fragmentation ion selection can be performed multiple times to essentially fully dissect the sample molecular sequence.

If there are two or more targets of similar base composition or quality or if the product of one amplification reaction is identical to two or more organism reference standards, they may be distinguished by a "marker" for quality modification. In embodiments of the invention, nucleotide analogs or "tags" (e.g., 5- (trifluoromethyl) deoxythymidine triphosphate) having different molecular weights than the unmodified bases may be added during amplification to improve quality discrimination. These markers are described in PCT WO 97/33000. This further limits the number of possible base compositions consistent with any mass. For example, 5- (trifluoromethyl) deoxythymidine triphosphate can be used in place of dTTP in an isolated nucleic acid amplification reaction. Determination of the mass transition between the conventional amplification product and the labeled product can be used to quantitate the number of thymidine nucleotides in each single strand. Because the strands are complementary, the number of adenosine nucleotides in each strand can also be determined.

In another amplification reaction, the number of G and C residues in each strand is determined, for example, using the cytidine analog 5-methylcytosine (5-meC) or propyne C. Combining the A/T reaction and the G/C reaction, and then determining the molecular weight, provides a unique base composition. This process is summarized in fig. 4 and table 1.

TABLE 1

The quality marker phosphorothioate a (a ×) was used to distinguish the bacillus anthracis cluster. Bacillus anthracis (A)₁₄G₉C₁₄T₉) Average MW 14072.26, Bacillus anthracis (A)₁A*₁₃G₉C₁₄T₉) Average molecular weight 14281.11 and phosphorothioate A average molecular weight +16.06, as determined by ESI-TOF-MS. The deconvolution spectrum is shown in fig. 5.

In another example, assuming measured molecular weights for each strand are 30,000.115Da and 31,000.115Da, respectively, the measured numbers of dT and dA residues are (30, 28) and (28, 30). If the molecular weight is accurate to 100ppm, there are 7 possible combinations of dG + dC per strand. However, if the molecular weight is determined to be accurate to 10ppm, there are only 2 combinations of dG + dC, and at 1ppm accuracy there is only one possible base composition for each strand.

The mass spectrometer signal may be input to a maximum likelihood detection and classification algorithm, as is widely used in radar signal processing. The detection process employs matched BCS filtering observed in the mass-base count space, allowing detection and subtraction of signatures of known harmless organisms, and detection of unknown biological threats. The threat level is estimated by comparing the newly observed organisms to known organisms, i.e., their BCS to the BCS of known organisms and to known forms of pathogenicity enhancement, such as the BCS of organisms having antibiotic resistance genes or toxin genes inserted.

Processing may end with a Bayesian classifier using log likelihood ratios generated from the observed signal and the average background level. This program emphasizes performance prediction culminating in detection probability vs. false alarm probability for complex background situations including naturally occurring organisms and environmental pollutants. The matched filter consists of the preferred expected value of the signal value, taking into account that the primer pair set is for each organism. Genomic sequence databases (e.g., GenBank) can be used for filters for defined mass base count matches. The database contains known threat agents and benign background organisms. The maximum likelihood detection of known background organisms is performed using a matched filter and a running-sum estimate of the noise covariance. The strength of the background signal is estimated and used with a matched filter to form a signature that is then subtracted. The most probable process is applied to this "cleaned" data in a manner similar to run-sum estimation on cleaned data using matched filters on the organisms and noise covariance.

In one embodiment, a strategy of "triangulating" each organism by measuring the signal of multiple core genes may be used to reduce false negative and false positive signals, as well as reconstructible source organisms or hybrid or engineered organisms. After identification of multiple core genes, alignments are generated from nucleic acid sequence databases. The alignment is then analyzed for conserved and variable regions to identify potential primer binding sites flanked by variable regions. Next, the amplification target regions for signature analysis are selected, and organisms are distinguished based on differences in specific genomes (i.e., base composition). For example, detection of the signature of the three toxin genes of B.anthracis in the absence of the expected signature of the B.anthracis genome (Bowen, J.E and C.P.Quinn, J.Appl.Microbiol.1999, 87, 270-278) suggests a genetic engineering event.

The method can also be used to rapidly and accurately detect Single Nucleotide Polymorphisms (SNPs) or multiple nucleotide polymorphisms. A SNP is defined as a single base pair position in the genome that differs from one individual to another. Differences may be expressed as deletions, insertions or substitutions, often linked to disease states. SNPs are the most common type of genetic marker binding in the human genome, since they occur every 100-1000 bases.

For example, sickle cell anemia results from an a-T transition that encodes a valine rather than a glutamic acid residue. Oligonucleotide primers can be designed such that they bind to sequences flanking the SNP site, followed by nucleotide amplification and quality determination of the amplified product. Because the molecular weight of the product obtained from an individual without sickle cell anemia is different from the molecular weight of the product from an individual with the disease, this method can be used to distinguish between the two individuals. Thus, this method can be used to detect any known SNP in an individual, thereby diagnosing or determining an increased susceptibility to a disease or condition.

In one embodiment, blood is drawn from an individual, Peripheral Blood Mononuclear Cells (PBMCs) are isolated and preferably tested simultaneously using a high throughput screening method, one or more SNPs, using appropriate primers based on known sequences flanking the SNP region. National center for Biotechnology information to maintain a publicly available SNP database ((II))www.ncbi.nlm.nih. gov/SNP/)。

The method of the invention may also be used for blood typing. Genes encoding A, B or O blood group may differ in four individual nucleotide polymorphisms. If this gene contains the sequence CGTGGTGACCCTT (SEQ ID NO: 5), antigen A is produced. If this gene contains the sequence CGTCGTCACCGCTA (SEQ ID NO: 6), antigen B is produced. If this gene comprises the sequence CGTGGT-ACCCCTT (SEQ ID NO: 7), blood group O is generated ("-" indicates a deletion). These sequences can be distinguished by designing a pair of primers flanking these regions, followed by amplification and mass measurement.

While the present invention has been described in detail with reference to certain preferred embodiments, the following examples are illustrative of the invention and are not intended to limit the invention.

Example 1

Nucleic acid isolation and PCR

In one embodiment, the nucleic acid of the organism is isolated and amplified by PCR using standard methods followed by mass spectrometry for BCS. Nucleic acids are isolated, for example, by lysis of bacterial cells with detergents, centrifugation and ethanol precipitation. Nucleic acid isolation procedures are described, for example, in Current Protocols in Molecular Biology (Ausubel et al) and Molecular cloning: a laboratory Manual (Sambrook et al). The nucleic acid is then amplified by standard methods, such as PCR, using primers that bind to conserved regions of the nucleic acid, which contain the intervening variable sequences described below.

Example 2

Mass spectrometry

FTICR instrument: the FTICR instrument is based on a 7 Tesla activation ratio shielded superconducting magnet and a modified BrunkerD altonics Apex II 70e ion optics and vacuum chamber. This spectrophotometer interfaces with the LEAP PAL autosampler and custom fluid control system for high throughput screening. Samples from 96-well or 384-well microtiter plates were analyzed directly at a rate of about 1 sample/min. The Bruker data-acquisition platform was supported by a lab-built auxiliary NT data station that controlled an autosampler and an optional waveform generator that generated complex rf-excitation waveforms (frequency sweep, filtered noise, Stored Waveform Inverse Fourier Transform (SWIFT), etc.) for advanced tandem MS experiments. Typical performance characteristics for 20-30 mer oligonucleotides include mass resolving power in excess of 100,000(FWHM), low ppm mass measurement error, and an operable m/z range between 50 and 5000 m/z.

Improved ESI sources: in a sample-limited assay, an analyte solution was delivered at 150 nL/min to a 30mm i.d. fused silica ESI emitter mounted on a 3-D micromanipulator. The ESI ion optics consisted of a heated metal capillary, an rf only sixth stage, a skimmer cone and an auxiliary gated electrode. The 6.2cm rf only sixth stage consists of a 1mm diameter rod operating at 380Vpp voltage at 5MHz frequency. A lab-built electro-mechanical shutter may be used to prevent the electrospray plume from entering the inlet capillary unless triggered to the "open" position by a TTL pulse at the data station. When in the "off" position, a stable electrospray plume is maintained between the ESI emitter and the shutter surface. The back of the shutter arm contains an elastic seal that can be adjusted in position to form a vacuum seal with the inlet capillary. When the seal is removed, a 1mm gap between the shutter blade and the capillary inlet maintains a constant pressure for external ion storage, whether the shutter is in the open or closed position. When the shutter is triggered, a "time slice" of ions is allowed to enter the inlet capillary and then collect in the external ion reservoir. The fast response time of the ion shutter (< 25ms) provides a reproducible user-set interval during which ions can be injected and accumulated in the external ion reservoir.

25 Watts of CW CO with an infrared multiphoton dissociation device operating at 10.6 μm₂Laser interface contact spectrophotometers allow infrared multiphoton dissociation (IRMPD) for oligonucleotide sequencing and other tandem MS applications. The aluminum optical bench was placed about 1.5m from the actively shielded superconducting magnet so that the laser beam was aligned with the magnet central axis. Using a standard IR-compatible mirror and moving mirror setup, an unfocused 3mm laser beam was collimated to pass directly through a 3.5mm aperture in the trapping electrode of the FTICR trapped ion cell and longitudinally through an external ion-directed hexapole region, finally striking a skimmer cone. This process allows IRMPD to be conducted in an m/z selective manner in the trapped ion chamber (e.g., SWIFT separation of species of interest below), or in a broadband mode in the high pressure region of an external ion reservoir where collisions with neutral molecules stabilize IRMPD-generated metastable fragment ions, resulting in increased fragment ion generation and sequence coverage.

Example 3

Identification of organisms

Table 1 shows a cross section of a molecular weight database calculated for more than 9 sets of primers and approximately 30 organisms-9 sets of primers were derived from rRNA alignment. Examples of regions of RRNA consensus alignments are shown in FIGS. 1A-1C. The arrowed line is the region of the intelligent primer pair designed for PCR. These primer pairs are > 95% conserved in bacterial sequence databases (currently over 10,000 organisms). The intervening region is variable in length and/or composition, thus providing a base composition "signature" (BCS) for each organism. The primer pair is selected such that the total length of the amplification region is less than about 80-90 nucleotides. The labels for each primer pair represent the starting and ending base numbers of the amplified region on the consensus map.

The short bacterial database cross-section included in table 1 is a collection of well-known pathogens/biowarfare agents (in bold/red font) such as bacillus anthracis or yersinia pestis and some bacteria commonly found in the natural environment such as streptomyces spp. Even closely related organisms can be distinguished from each other by appropriate choice of primers. For example, two low G + C organisms, Bacillus anthracis and Staphylococcus aureus, can be distinguished from each other by using primer pairs identified as 16S _1337 or 23S _855(Δ M of 4 Da).

TABLE 2

Calculated molecular weight¹Cross section of database

1. Molecular weight distribution of the PCR amplified region for selection of organisms (rows) across various primer pairs (columns). Pathogens are shown in bold. Empty cells represent currently incomplete or missing data.

FIG. 6 shows the accurate mass determination using ESI-FT-ICR MS. Shows the spectra of 46 mer PCR products derived from the 1337 positions of the 16S rRNA of Staphylococcus aureus (upper) and Bacillus anthracis (lower). These data are from the region of the spectrum containing [ M-8H + of the respective 5 '-3' chain]^8-A charged state signal. The two chains differ by two (AT → CG) substitutions, and the masses determined were 14206.396 and 14208.373 + -0.010 Da, respectively. Possible base compositions derived from the forward and reverse strand masses of the B.anthracis product are listed in Table 3.

TABLE 3 possible base composition of Bacillus anthracis products

Calculated mass	Error of the measurement	Base composition
			14208.2935	0.079520	A1 G17 C10 T18
14208.3160	0.056980	A1 G20 C15 T10
			14208.3386	0.034440	A1 G23 C20 T2
14208.3074	0.065560	A6 G11 C3 T26
			14208.3300	0.043020	A6 G14 C8 T18
14208.3525	0.020480	A6 G17 C13 T10
			14208.3751	0.002060	A6 G20 C18 T2
14208.3439	0.029060	A11 G8 C1 T26
			14208.3665	0.006520	A11 G11 C6 T18
14208.3890	0.016020	A11 G14 C11 T10
			14208.4116	0.038560	A11 G17 C16 T2
14208.4030	0.029980	A16 G8 C4 T18
			14208.4255	0.052520	A16 G11 C9 T10
14208.4481	0.075060	A16 G14 C14 T2
			14208.4395	0.066480	A21 G5 C2 T18
14208.4620	0.089020	A21 G8 C7 T10
			14079.2624	0.080600	A0 G14 C13 T19
14079.2849	0.058060	A0 G17 C18 T11
			14079.3075	0.035520	A0 G20 C23 T3
14079.2538	0.089180	A5 G5 C1 T35
			14079.2764	0.066640	A5 G8 C6 T27
14079.2989	0.044100	A5 G11 C11 T19
			14079.3214	0.021560	A5 G14 C16 T11
14079.3440	0.000980	A5 G17 C21 T3
			14079.3129	0.030140	A10 G5 C4 T27
14079.3354	0.007600	A10 G8 C9 T19
			14079.3579	0.014940	A10 G11 C14 T11
14079.3805	0.037480	A10 G14 C19 T3
			14079.3494	0.006360	A15 G2 C2 T27
14079.3719	0.028900	A15 G5 C7 T19
			14079.3944	0.051440	A15 G8 C12 T11
14079.4170	0.073980	A15 G11 C17 T3
			14079.4084	0.065400	A20 G2 C5 T19
14079.4309	0.087940	A20 G5 C10 T13

Of the 16 compositions of the forward strand and 18 compositions of the reverse strand calculated, only one pair (shown in bold) is complementary, corresponding to the actual base composition of the B.anthracis PCR product.

Example 4

BCS of the Bacillus anthracis and Bacillus cereus regions

Bacillus anthracis (A) is synthesized₁₄G₉C₁₄T₉) And Bacillus cereus (A)₁₅G₉C₁₃T₉) Contains a C to A base change and receives ESI-TOF MS. The results are shown in fig. 7, where the two regions are clearly distinguished by the method of the present invention (MW 14072.26 vs 14096.29).

Example 5

Identification of other organisms

In another embodiment of the invention, the pathogen Vibrio cholerae is distinguishable from Vibrio parahaemolyticus by a.DELTA.M > 600Da using three sets of 16S primers (16S _971, 16S _1228 or 16S _1294) as shown in Table 4, as described in Table 2. The two mycoplasma species listed (mycoplasma genitalium and mycoplasma pneumoniae) can also be distinguished from each other, as can the three mycobacteria. Direct mass measurement of amplification products allows identification and differentiation of a large number of organisms, while measurement of base composition signatures provides significantly improved resolution of closely related organisms. In the case of B.anthracis and B.cereus that cannot be effectively distinguished from each other based solely on mass differences, fragmentation patterns or compositional analysis can be used to distinguish such differences. The single base difference between these two organisms gives rise to different fragmentation patterns, and both organisms are identifiable despite the presence of the ambiguous/unidentified base N at position 20 of bacillus anthracis.

Tables 4a-b show examples of primer pairs of Table 1, which distinguish between pathogens and background.

TABLE 4a

Name of organism	23S_855	16S_1337	23S_1021
				Bacillus anthracis	42650.98	28447.65	30294.98
Staphylococcus aureus	42654.97	28443.67	30297.96

TABLE 4b

Name of organism	16S_971	16S_1294	16S_1228
				Vibrio cholerae	55625.09	35856.87	52535.59
Vibrio parahaemolyticus	54384.91	34620.67	50064.19

Table 4 shows the predicted molecular weights and base compositions of the 16S-1100-1188 region in M.avium and S.sp.

TABLE 5

Region(s)	Name of organism	Length of	Molecular weight	Base composition
					16S_1100-1188	M. avium	82	25624.1728	A₁₆G₃₂C₁₈T₁₆
16S_1100-1188	Streptomyces sp	96	29904.871	A₁₇G₃₈C₂₇T₁₄

Table 5 shows the results of the base composition (single-stranded) of the amplification reaction of the different species of bacteria 16S-1100-1188 primer. The species repeated in the table (e.g., Clostridium botulinum) are different strains containing different base compositions in the 16S-1100-1188 region.

TABLE 6

The same organisms containing different base compositions are different strains. The organism population indicated in bold or italics has the same base composition in the amplified region. Some of these organisms can be distinguished by multiple primers. For example, Bacillus anthracis can be distinguished from Bacillus cereus and Bacillus thuringiensis using primers 16S _971-1062 (Table 6). Additional primer pairs that generate unique base composition signatures are shown in table 6 (bold). Contain very similar threat organisms and ubiquitous non-threat organisms (e.g., anthrax populations) and are distinguished at high resolution by a series of aggregated primer pairs. The known biowarfare agents in Table 6 are Bacillus anthracis, Yersinia pestis, Francisella tularensis and Rickettsia prowazekii.

TABLE 7

The sequences of B.carbonum and B.cereus in the 16S _971 region are shown below. Bold indicates single base differences between the two species that can be detected using the methods of the invention. Bacillus anthracis has an ambiguous base at position 20.

Bacillus anthracis 16S 971

GCGAAGAACCUUACCAGGUNUUGACAUCCUCUGACAACCCUAGAGAUAGGGC

UUCUCCUUCGGGAGCAGAGUGACAGGUGGUGCAUGGUU(SEQ ID NO：1)

Bacillus cereus 16S 971

GCGAAGAACCUUACCAGGUCUUGACAUCCUCUGAAAACCCUAGAGAUAGGGC

UUCUCCUUCGGGAGCAGAGUGACAGGUGGUGCAUGGUU(SEQ ID NO：2)

Example 6

ESI-TOF MS plus calibration of sspE 56 mer

The mass measurement accuracy that can be obtained using internal mass standards in ESI-MS studies of PCR products is shown in figure 8. This quality standard was a 20-mer phosphorothioate oligonucleotide added to a solution containing the 56-mer PCR product produced by the Bacillus anthracis spore coat protein sspE. The expected quality of the PCR product distinguishes Bacillus anthracis from other species of Bacillus such as Bacillus thuringiensis and Bacillus cereus.

Example 7

ESI-TOF synthetic 16S _1228 duplexes of Bacillus anthracis

ESI-TOF MS spectra were obtained from aqueous solutions of individual synthetic analogs of the expected forward and reverse PCR products containing 5. mu.M of the nucleotide 1228 region of the Bacillus anthracis 16S rRNA gene. The results (FIG. 9) show that the molecular weights of the forward and reverse strands can be accurately determined and that the two strands can be easily distinguished. Display [ M-21H⁺]^21-And [ M-20H⁺]^20-The charged state of (c).

Example 8

ESI-FTICR MS of synthetic Bacillus anthracis 16S-133746 base pair duplexes

ESI-FTICR-MS spectra were obtained from aqueous solutions containing 5. mu.M of each synthetic analogue of the expected forward and reverse PCR products from the nucleotide 1337 region of the 16S rRNA gene of Bacillus anthracis. The results (FIG. 10) show that the molecular weights of the two chains can be distinguished in this way. Display [ M-16H⁺]^16-To [ M-10H⁺]^10-The charged state of (c). This insertion emphasizes the resolution that can be discerned on an FTICR-MS instrument, such that the charge state of the ion can be determined by the mass difference between peaks, which differ by a single 13C substitution.

Example 9

ESI-TOF MS of 56-mer oligonucleotide of B.anthracis saspB gene with internal mass standard

ESI-TOF MS spectra were obtained from synthetic 56-mer oligonucleotides (5. mu.M) from the B.anthracis saspB gene containing an internal mass standard with 1.7. mu.L/min of ESI as a function of sample consumption. Results (fig. 11) the noise display ratio signal improved and was summarized with more scans, and the standard and product were visible after only 100 scans.

Example 10

ESI-TOF MS with internal mass standard of Tributylammonium (TBA) -trifluoroacetic acid (TFA) buffer

ESI-TOF MS spectra of 20-mer phosphorothioate mass standards were obtained after 5mM TBA-TFA buffer was added to the solution. This buffer band acquires charge from the oligonucleotide and brings the most abundant charge state from [ M-8H⁺]^8-Conversion to [ M-3H⁺]^3-(FIG. 12).

Claims

1. A method of identifying an unknown organism, the method comprising:

a) contacting a nucleic acid of said organism with at least one pair of oligonucleotide primers that hybridize to a sequence of said nucleic acid, wherein said sequence is flanked by variable nucleic acid sequences of said organism;

b) amplifying the variable nucleic acid sequence to produce an amplification product;

c) determining the molecular weight of the amplification product;

d) comparing said molecular weight to the molecular weight of one or more amplification products obtained by performing steps a) -c) on a plurality of known organisms, wherein a match identifies said unknown organism.

2. The method of claim 1, wherein said sequences to which said at least one pair of oligonucleotide primers hybridize are highly conserved.

3. The method of claim 1, wherein the amplifying step comprises polymerase chain reaction.

4. The method of claim 1, wherein the amplifying step comprises ligase chain reaction or strand displacement amplification.

5. The method of claim 1, wherein the organism is a bacterium, virus, cell, or spore.

6. The method of claim 1, wherein the nucleic acid is ribosomal RNA.

7. The method of claim 1, wherein the nucleic acid encodes rnase P or an RNA-dependent RNA polymerase.

8. The method of claim 1, wherein the amplification product is ionized prior to molecular mass determination.

9. The method of claim 1, further comprising the step of isolating nucleic acids from the organism prior to contacting the nucleic acids with the at least one pair of oligonucleotide primers.

10. The method of claim 1, further comprising the steps of performing steps a) -d) with different oligonucleotide primer pairs and comparing the results to the molecular weight of one or more amplification products obtained by performing steps a) -c) on a different plurality of known organisms of step d).

11. The method of claim 1, wherein the one or more molecular weights are contained in a molecular weight database.

12. The method of claim 1, wherein the amplification product is ionized by electrospray ionization, matrix-assisted laser desorption, or fast atom bombardment.

13. The method of claim 1, wherein the molecular weight is determined by mass spectrometry.

14. The method of claim 11, wherein the mass spectrometry is selected from fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), ion trap, quadrupole, magnetic sector, time of flight (TOF), Q-TOF or triple quadrupole.

15. The method of claim 1, further comprising performing step b) in the presence of an analog of adenine, thymidine, guanosine or cytidine having a different molecular weight than adenosine, thymidine, guanosine or cytidine.

16. The method of claim 1, wherein said oligonucleotide primer comprises a base analog at positions 1 and 2 of each triplet within said primer, wherein said base analog binds its complement with higher affinity than the native base.

17. The method of claim 16, wherein said primer comprises a universal base at position 3 of each triplet within said primer.

18. The method of claim 16, wherein the base analog is selected from the group consisting of 2, 6-diaminopurine, propyne T, propyne G, phenazine, and G-clamp.

19. The method of claim 16, wherein the universal base is selected from the group consisting of inosine, guanidium, 5-nitroindole, 3-nitropyrrole, dP, dK, and 1- (2-deoxy- β -D-ribofuranose) -imidazole-4-amide.

20. A method of identifying an unknown organism, the method comprising:

a) contacting a nucleic acid of said organism with at least one pair of oligonucleotide primers capable of hybridizing to sequences of said nucleic acid, wherein said sequences are flanked by a variable nucleic acid sequence;

c) determining the base composition of the amplification product;

d) comparing said base composition to the base composition of one or more amplification products obtained from steps a) -c) performed on a plurality of known organisms, wherein a match identifies said unknown organism.

21. The method of claim 20, wherein said sequences to which said at least one pair of oligonucleotide primers hybridize are highly conserved.

22. The method of claim 20, wherein the amplifying step comprises polymerase chain reaction.

23. The method of claim 20, wherein the amplifying step comprises ligase chain reaction or strand displacement amplification.

24. The method of claim 20, wherein the organism is a bacterium, virus, cell, or spore.

25. The method of claim 20, wherein the nucleic acid is ribosomal RNA.

26. The method of claim 20, wherein the nucleic acid encodes rnase P or an RNA-dependent RNA polymerase.

27. The method of claim 20, wherein the amplification product is ionized prior to base composition determination.

28. The method of claim 20, further comprising the step of isolating nucleic acids from said organism prior to contacting said nucleic acids with said at least one pair of oligonucleotide primers.

29. The method of claim 20, further comprising the steps of performing steps a) -d) using different oligonucleotide primer pairs and comparing the results to the base composition of one or more amplification products obtained from performing steps a) -c) on a different plurality of known organisms from step d).

30. The method of claim 20, wherein the one or more base composition signatures are comprised in a base composition signature database.

31. The method of claim 20, wherein the amplification product is ionized by electrospray ionization, matrix-assisted desorption, or fast atom bombardment.

32. The method of claim 20, wherein said base composition signature is determined by mass spectrometry.

33. The method of claim 32, wherein said mass spectrometry is selected from fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), ion trap, quadrupole, magnetic sector, time of flight (TOF), Q-TOF or triple quadrupole.

34. The method of claim 20, further comprising performing step b) in the presence of an analog of adenine, thymidine, guanosine or cytidine having a different molecular weight than adenosine, thymidine, guanosine or cytidine.

35. The method of claim 20, wherein said oligonucleotide primer comprises a base analog at positions 1 and 2 of each triplet within said primer, wherein said base analog binds its complement with higher affinity than the native base.

36. The method of claim 35, wherein said primer comprises a universal base at position 3 of each triplet within said primer.

37. The method of claim 35, wherein said base analog is selected from the group consisting of 2, 6-diaminopurine, propyne T, propyne G, phenazine, and G-clamp.

38. The method of claim 36, wherein the universal base is selected from the group consisting of inosine, guanidium, 5-nitroindole, 3-nitropyrrole, dP, dK, and 1- (2-deoxy- β -D-ribofuranose) -imidazole-4-amide.

39. A method for detecting a single nucleotide polymorphism in an individual, comprising the steps of:

a) isolating the nucleic acid of the individual;

b) contacting said nucleic acid with an oligonucleotide primer that hybridizes to a region of said nucleic acid that is flanked by a region containing said potential polymorphism;

c) amplifying the region to produce an amplification product;

d) determining the molecular weight of the amplification product;

e) comparing the molecular weight to the molecular weight of the region of individuals known to have the polymorphism, wherein if the molecular weights are the same, the individuals have the polymorphism.

40. The method of claim 39, wherein said polymorphism is associated with a disease.

41. The method of claim 39, wherein said polymorphism is a blood group antigen.

42. The method of claim 39, wherein the amplifying step is a polymerase chain reaction.

43. The method of claim 39, wherein the amplification step is ligase chain reaction or strand displacement amplification.

44. The method of claim 39, wherein the amplification product is ionized prior to mass measurement.

45. The method of claim 39, wherein the amplification product is ionized by electrospray ionization, matrix-assisted laser desorption, or fast atom bombardment.

46. The method of claim 39, wherein the primer hybridizes to a conserved sequence.

47. The method of claim 39, wherein the molecular weight is determined by mass spectrometry.

48. The method of claim 47, wherein said mass spectrometry is selected from Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), ion trap, quadrupole, magnetic sector, time of flight (TOF), Q-TOF, or triple quadrupole.