[go: up one dir, main page]

WO2010065322A1 - Concurrent identification of multitudes of polypeptides - Google Patents

Concurrent identification of multitudes of polypeptides Download PDF

Info

Publication number
WO2010065322A1
WO2010065322A1 PCT/US2009/065086 US2009065086W WO2010065322A1 WO 2010065322 A1 WO2010065322 A1 WO 2010065322A1 US 2009065086 W US2009065086 W US 2009065086W WO 2010065322 A1 WO2010065322 A1 WO 2010065322A1
Authority
WO
WIPO (PCT)
Prior art keywords
amino acid
terminal amino
polypeptide
complexing agents
polypeptides
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2009/065086
Other languages
French (fr)
Inventor
Benjamin J. Cargile
James L. Stephenson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
RTI International Inc
Original Assignee
RTI International Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by RTI International Inc filed Critical RTI International Inc
Priority to CA2745197A priority Critical patent/CA2745197A1/en
Publication of WO2010065322A1 publication Critical patent/WO2010065322A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6818Sequencing of polypeptides
    • G01N33/6824Sequencing of polypeptides involving N-terminal degradation, e.g. Edman degradation
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B82NANOTECHNOLOGY
    • B82YSPECIFIC USES OR APPLICATIONS OF NANOSTRUCTURES; MEASUREMENT OR ANALYSIS OF NANOSTRUCTURES; MANUFACTURE OR TREATMENT OF NANOSTRUCTURES
    • B82Y15/00Nanotechnology for interacting, sensing or actuating, e.g. quantum dots as markers in protein assays or molecular motors
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/58Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving labelled substances
    • G01N33/588Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving labelled substances with semiconductor nanocrystal label, e.g. quantum dots

Definitions

  • the invention relates to the use of complexing agents with specificity for N-terminal amino acids or their derivatives in sequencing and structurally characterizing polypeptides.
  • Complexing agents may include, but are not limited to, antibodies, proteins, peptides, peptoids, DNA, RNA, PNA, GNA, TNA, or aptamers.
  • Chemical protein sequencing has been and continues to be a popular method for determining the primary structure of proteins. See Stolowitz, “Chemical Protein Sequencing and Amino Acid Analysis,” Curr. Opin. Biotech. 4:9-13 (1993) and Hunkapiller, M. W., "Contemporary Methodology for the Determination of the Primary Structure of Proteins," Macromol. Seq. and Synthesis, Ed. D. H. Schlesinger, pp.45-58, Alan R. Liss: New York, N.Y. (1988).
  • Edman degradation typically includes a derivatization step, a cleavage step, and a conversion step.
  • a derivatization step For example, in an Edman degradation, the amino terminus of a target polypeptide is derivatized to a thiocarbamoyl, which is cleaved from the polypeptide with an organic acid.
  • the cleaved amino acid may be converted to a phenylthiohydantoin (PTH) form by treatment with an aqueous solution of organic acid.
  • PTH phenylthiohydantoin
  • the PTH amino acid may then be detected, for example, by high pressure liquid chromatography (HPLC) or by mass spectrometry (Aebersold, R., et al., "Design, Synthesis, and Characterization of a Protein Sequencing Reagent Yielding Amino Acid Derivatives with Enhanced Detectability by Mass Spectrometry," Protein Science 1:494-503 (1992)).
  • HPLC high pressure liquid chromatography
  • mass spectrometry mass spectrometry
  • the reagents of the Edman process may be delivered to a target polypeptide which is covalently or non-covalently attached to a solid support.
  • Solid supports used in protein sequencing include polyvinylidene difluoride (PVDF), glass beads or polystyrene beads.
  • the degradation step includes the thioacetylation of the amino-terminal amino acid for detection by gas chromatography/mass spectrometry (Stolowitz, M L et al., "Thioacetylation Method of Protein Sequencing: Gas Chromatography/Ion Trap Mass Spectrometric Detection of 5-acetoxy-2-Methylthiazoles," J Protein Chem. 11:360-361 (1992)).
  • dabsyl chloride labels the peptide, which is then hydrolyzed with hydrochloric acid. The dabsyl-amino acid may be identified chromatographically.
  • Enzymatic digestion of terminal amino acids has been used to sequence polypeptides, for example, using amino-terminal or carboxy-terminal specific exopeptidases.
  • exopeptidase examples include the aminopeptidase 1, LAP, (Liver Activating Protein) proline aminodipeptidase, leucine aminopeptidase, microsomal peptidase and cathepsin C.
  • Serine carboxypeptidases have proven to be useful in sequentially removing residues from the C-terminus of a protein or a polypeptide.
  • Carboxypeptidase Y (CPY) is an attractive enzyme because it non- specifically cleaves all residues from the C-terminus, including proline.
  • the procedures described above may require, at a minimum, sub- femtomole (>10 "15 ) concentrations of polypeptide.
  • the methods may also be sensitive to the purity of the polypeptide sample, which may give rise to sequencing errors. Carryover of incomplete amino-terminal cleavage into the next cycle may result in a steadily increasing proportion of a population of molecules being out of phase with the expected order of release. Additionally, recovery and detection of the cleaved N- terminal amino acid may be difficult and/or time-consuming under current procedures.
  • any unique protein or series of particular marker proteins may be targeted to detect the presence of a particular species.
  • protein toxins the toxin molecule itself may be used as the biomarker.
  • Current technologies for multiplexed protein analysis include antibody microarrays, 2-dimensional gel electrophoresis, and shotgun proteomics. These methods may be limited in their ability to overcome the sensitivity and selectivity problems associated with the analysis of such a diverse chemical milieu of polypeptides present in biological samples.
  • the methods herein are directed to new assay technologies that allow identification and quantitation of multitudes of peptides at the single molecule level with a low false positive rate.
  • This technology has application to the specific detection of novel, emerging or engineered peptides or proteins as well as application to the detection of functional signatures of known peptides or proteins.
  • Methods herein described may provide a highly sensitive and rapid method for sequencing a polypeptide that does not require labeling of the target polypeptide before sequencing and avoids the repeated isolation and analysis of cleaved portions of a polypeptide as in past sequencing methods. Methods described herein apply to sequencing from either the C- or N-terminus of any peptide.
  • the methods described herein provide for identifying the N-terminal amino acid of a peptide while it is still covalently linked to the peptide.
  • the methods described herein may provide for sequencing or structurally characterizing a polypeptide using an N- terminal amino acid complexing agent.
  • a process for simultaneously identifying N-terminal amino acids of two or more polypeptides comprises forming a complex between an N-terminal amino acid moiety of a polypeptide with a complexing agent capable of binding to the N-terminal amino acid moiety.
  • the complexing agent has a detectable label. Detecting the detectable label in the complex provides for identification of the N-terminal amino acid of the polypeptide.
  • a process for determining at least a portion of the amino acid sequence of a polypeptide of interest comprises the steps of forming a complex between an N-terminal amino acid moiety of the polypeptide and a complexing agent having specificity for the N-terminal amino acid moiety of the polypeptide.
  • the complexing agent is selected from a set of complexing agents, where each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection.
  • the detectable label in the complex is detected in order to identify the N-terminal amino acid moiety of the polypeptide.
  • the N-terminal amino acid moiety from the polypeptide is removed.
  • the steps of forming a complex and detecting the detectable label in the complex are repeated in order to determine at least a portion of the amino acid sequence of the polypeptide of interest.
  • a process for determining at least a portion of the amino acid sequence of a plurality of polypeptides in a sample comprises bonding at least some of the plurality of polypeptides of the sample, each at a specific location on a surface.
  • the surface is contacted with one or more complexing agents, the complexing agents selected from a set of complexing agents, where each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection.
  • Complexes between N-terminal amino acid moieties of the at least some of the polypeptides and a complexing agent from the set of complexing agents are formed.
  • the detectable label in the complex is detected at specific locations on the surface in order to identify the N-terminal amino acid moiety at the specific locations on the surface.
  • the N-terminal amino acid moiety from the polypeptides at the specific locations on the surface are removed and the complexing and detection steps repeated in order to determine at least a portion of the amino acid sequence of the at least some of the polypeptides at the specific locations on the surface.
  • FIG. 1 illustrates an embodiment depicting a process for identifying and/or sequencing a surface bound polypeptide.
  • FIG. 2 illustrates an embodiment depicting a process for identifying and/or sequencing a surface bound polypeptide using an aptamer complexing agent.
  • FIG. 3 illustrates an embodiment depicting a process for identifying and/or sequencing a surface bound polypeptide using N-terminal aminopeptidases.
  • a process for identifying the N-terminal amino acid of a polypeptide comprises forming a complex between an N- terminal amino acid moiety of a polypeptide with a complexing agent having specificity for the N-terminal amino acid moiety.
  • the complexing agent has a detectable label. Detecting the detectable label in the complex provides for identification of the N-terminal amino acid of the polypeptide.
  • a process for determining at least a portion of the amino acid sequence of a polypeptide of interest is provided.
  • the process comprises the steps of forming a complex between an N-terminal amino acid moiety of the polypeptide and a complexing agent having specificity for the N-terminal amino acid moiety of the polypeptide.
  • the complexing agent is selected from a set of complexing agents, where each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) has a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection.
  • the detectable label in the complex is detected in order to identify the N-terminal amino acid moiety of the polypeptide.
  • the N-terminal amino acid moiety from the polypeptide is removed.
  • a process for determining at least a portion of the amino acid sequence of a plurality of polypeptides in a sample comprises bonding at least some of the plurality of polypeptides of the sample, each at a specific location on a surface.
  • the surface is contacted with one or more complexing agents, the complexing agents selected from a set of complexing agents, wherein each complexing agent in the set has (a) specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) each complexing agent comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection.
  • Complexes between N-terminal amino acid moieties of the at least some of the polypeptides and a complexing agent from the set of complexing agents are formed.
  • the detectable label in the complex is detected at specific locations on the surface in order to identify the N-terminal amino acid moiety at the specific locations on the surface.
  • N-terminal amino acid moiety from the polypeptides at the specific locations on the surface are removed and the complexing and detection steps repeated in order to determine at least a portion of the amino acid sequence of the at least some of the polypeptides at the specific locations on the surface.
  • the methods of the present invention may be performed by (a) labeling the N-terminal amino acid of a polypeptide with a complexing agent having a detectable label, (b) detecting the presence of the labeled N-terminal amino acid, and (c) removing the N-terminal amino acid using an N-terminal amino acid removing agent having specificity for one or more labeled amino acids. Such steps may be repeated to identify each subsequent N-terminal amino acid after removal of the prior complexed N-terminal amino acid. Such methods may be performed on multiple identical or different polypeptides in a high-through-put method to determine the sequence of multiple polypeptides simultaneously. For example, the methods of the present invention can be used to determine the amino acid sequence of 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100, 150, 500, or more different polypeptides simultaneously. [0022] Definitions:
  • polypeptide refers generally to a molecule that comprises one or more amino acid monomers covalently linked together.
  • Polypeptide includes proteins as well as short polypeptides that are approximately 100 amino acids or less in length. In one embodiment, the polypeptide is 10 amino acids or greater in length.
  • Polypeptides may be artificially synthesized, isolated from nature or modified for compatibility with the methods herein described (e.g., the polypeptide may be digested with trypsin to reduce its size, or other enzymes may be added to remove polysaccharides, neutralizing by mild acid or neuraminidase to remove sialic acid, reacted with alkaline phosphatase to remove phosphate, or with sulfatases or by chemical means to remove sulfate or oxidize thiols).
  • N-terminal amino acid moiety refers to an N- terminal amino acid or its derivative.
  • the term N-terminal amino acid moiety includes post-translationally modified N-terminal amino acids.
  • Post-translationally modified N-terminal amino acids include, for example, amino acids resulting from deamindation of glutamine or asparagine, or partial tryptic peptides.
  • C-terminal amino acid moiety refers to a C- terminal amino acid or its derivative.
  • C-terminal amino acid moiety includes post-translationally modified C-terminal amino acids.
  • Post-translationally modified C-terminal amino acids include, for example, amino acids resulting from methylation or amidation of the C-terminal residue.
  • subset of N-terminal amino acid moieties refers to a group of N-terminal amino acid moieties, less than all of the amino acids, having a shared chemical or structural relationship.
  • the subset of aspartic acid and glutamic acid N-terminal moieties are chemically related as being acidic.
  • the N- terminal amino acid moieties of histidine, phenylalanine, tryptophan, and tyrosine are structurally related as having an aromatic substituent, for example.
  • the term "subset of C-terminal amino acid moieties” refers to a group of C-terminal amino acid moieties, less than all of the amino acids, having a shared chemical or structural relationship.
  • the subset of aspartic acid and glutamic acid C-terminal moieties are chemically related as being acidic.
  • the C-terminal amino acid moieties of histidine, phenylalanine, tryptophan, and tyrosine are structurally related as having an aromatic substituent.
  • the term “peptoid” refers to a peptidomimetic that results from the oligomeric assembly of N-substituted glycines.
  • the peptiod may be substituted along its backbone in a manner analogous to an amino acid molecule.
  • the term "aptamer” refers generally to small single stranded RNAs or DNAs of approximately 10-120 nucleotides in length that are capable of forming secondary and tertiary structures. Aptamers include oligonucleotides which bind to N-terminal amino acid moieties. Aptamers include, for example, those having affinity to a specific amino acid as disclosed in Gold, et al., "Diversity of Oligonucleotide Function," Ann. Rev. Biochem. 64: 763-97 (1995).
  • N-terminal amino acid removing reagent refers to a compound or composition of matter capable of removing a single amino acid moiety from the N- terminus of the polypeptide.
  • the N-terminal amino acid removing reagent including aminopeptidases such as leucine aminopeptidase, microsomal peptidase, aminopeptidase 1 , LAP (Liver Activating Protein), proline aminodipeptidase, cathepsin C and those identified using the methods disclosed herein.
  • the N-terminal amino acid removing reagent may be a chemical compound, such as those known in the art for catalyzing the cleavage of the terminal monomers of polypeptides.
  • Chemical agents include, but are not limited to, cyanogen bromide, hydrochloric acid, sulfuric acid, and pentafluoroproprionic fluorohydride.
  • “comprising,” “including,” “containing,” “characterized by,” and grammatical equivalents thereof are inclusive or open-ended terms that do not exclude additional, unrecited elements or method steps. “Comprising” is to be interpreted as including the more restrictive terms “consisting of and “consisting essentially of.”
  • the methods herein described comprise forming a complex between an N- terminal amino acid moiety and a complexing agent.
  • the complex may be detected and the N-terminal amino acid identified.
  • the complexing agent may be washed away after the N-terminal amino acid is identified and the N-terminal amino acid may then be removed should the identification of the adjacent N-terminal amino acid be desired. In the event that the identification of additional adjacent N-terminal amino acids is desired, the method may be repeated.
  • a derivative of the N-terminal amino acid may be formed with a compound that forms a stable covalent bond.
  • a derivative of the N-terminal amino acid may be formed from reaction with fluorodinitrobenzenes, dabsyl chlorides, dansyl chlorides or phenyl isothiocyanates.
  • the derivative may be a phenylthiocarbamoyl derivative of the N-terminal amino acid formed by reaction with a phenyl isothiocyanate.
  • the methods herein described utilize a complexing agent specific for an N-terminal amino acid moiety of a peptide or subset of N-terminal amino acid moieties.
  • Complexing agents specific for N-terminal amino acid moieties may have high affinity and/or high specificity for the N-terminal amino acid moiety.
  • Complexing agents may have affinities, defined as an equilibrium dissociation constant, in the micromolar to about sub-nanomolar range, however, complexing agents having other affinities may also be used.
  • High affinity and high specificity binding complexing agents may be derived, for example, from combinatorial libraries.
  • the methods may utilize a complexing agent capable of complexing with an N-terminal amino acid moiety of a peptide.
  • Such complexing agents may be capable of complexing with N-terminal amino acid moieties with high affinity to the N-terminal amino acid moieties.
  • the complexing agent may be an antibody. Antibodies specific to N- terminal amino acid moieties may be screened for specificity for an N-terminal amino acid moiety, isolated and labeled using conventional chemistries and techniques. The antibodies may be mono- or polyclonal or may be fragments of whole antibody. Antibodies specific to N-terminally modified peptide (e.g., PITC-amino acids) may be generated and used to substitute for another complexing agent (e.g., aptamer) that otherwise could not be readily generated.
  • the complexing agent may be a protein, peptide, or peptoid.
  • the protein, peptide, or peptoid may be screened for specificity for an N-terminal amino acid moiety, isolated and labeled using conventional chemistries and techniques. (See, e.g. Emili, et al., Nature Biotech., (2000) 18:393-397)
  • the complexing agent may be DNA, RNA (for example, tRNA), peptide nucleic acid (PNA), glycerol nucleic acid (GNA), threose nucleic acid (TNA) or an aptamer.
  • the DNA, RNA, PNA, GNA, TNA may be single or double stranded with at least a portion thereof capable of specifically binding an N-terminal amino acid moiety.
  • Such complexing agents with specificity for an N-terminal amino acid moiety may be screened, isolated and labeled using conventional and modifications of conventional chemistries and techniques.
  • a preferred complexing agent may include RNA or DNA aptamers.
  • Any method may be used to screen randomized oligonucleotides for aptamer activity.
  • General methods for screening randomized oligonucleotides for aptamer activity such as, for example, the "SELEX” (Systematic Evolution of Ligands by Exponential Enrichment) method as described in Gold, et al. (U.S. Pat. No. 5,270,163) are known.
  • the size of the epitope may be increased by making aptamers to the corresponding PITC derivatized amino acid(s).
  • aptamers capable of recognizing N-terminal amino acids from a peptide with high affinity and specificity include aptamers for both phenyl isothiocyanate modified and unmodified amino acids.
  • Aptamers with specificity and affinity for N-terminal amino acid moieties may be used for polypeptide and protein identification and/or sequencing according to the methods herein described.
  • Aptamers may have high affinities, with equilibrium dissociation constants ranging from between 100 micromolar to sub-nanomolar depending on the selection used, and/or have high selectivity for a single N-terminal amino acid moiety. For example, aptamers with equilibrium dissociation constants less than 20 ⁇ M, less than 10 ⁇ M, less than 5 ⁇ M, or less than 3 ⁇ M may be used. Aptamers may be of high affinity for more than one N-terminal amino acid moiety, for example, a subset of N-terminal amino acid moieties. In such a case, the affinity for the subset of N-terminal amino acid moieties may be detectably distinguishable.
  • Aptamers may be selective for more than one amino acid moiety, for example, a subset of N-terminal amino acid moieties. In such a case, the selectivity for the subset of N-terminal amino acid moieties may be detectably distinguishable. [0043] Aptamers may be modified to improve binding specificity or stability as long as the aptamer retains a portion of its ability to bind and recognize its target amino acid. Methods for modifying the bases and sugars of nucleotides are known in the art.
  • An aptamer may comprise a phosphodiester, phosphoroamidite, phosphorothioate or other known linkage between its nucleotides if the linkage does not substantially interfere with the interaction of the aptamer with its target amino acid.
  • Aptamers suitable for use in the methods herein described may be synthesized by a polymerase chain reaction (PCR), a DNA or RNA polymerase, a chemical reaction or a machine synthesizer according to standard methods known in the art such as, for example, an automated DNA synthesizer from Applied Biosystems, Inc. (Foster City, Calif.) using standard chemistries.
  • PCR polymerase chain reaction
  • DNA or RNA polymerase e.g., RNA polymerase
  • chemical reaction e.g., a chemical reaction
  • a machine synthesizer e.g., a DNA synthesizer from Applied Biosystems, Inc. (Foster City, Calif.) using standard chemistries.
  • Aptamers against derivatized amino acids e.g., PITC-amino acid derivative
  • Multiple complexing agents may be used.
  • a set of complexing agents wherein each complexing agent in the set has (a) specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) each complexing agent comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection.
  • the number of complexing agents may be equal to or less than the number of amino acids present, known or expected to be present in the sequences of the peptides.
  • the selectivity of the complexing agent may be such that the complexing agent is nonspecific to a limited number of amino acids present, known or expected in the sequences of the peptides.
  • the affinity of a non-selective complexing agent may be such that although the complexing agent is non-specific to more than one amino acid, it has a higher affinity for one of the amino acids.
  • Properties of the complexing agent may be improved by modification of all or a portion of the complexing agent.
  • a pyrimidine may be replaced with a 2'-fluoro-pyrimidine to increase its affinity, or the backbone of the RNA aptamer may be replaced by phosphorothioate or phosphoroamidite to increase the stability of an aptamer or its affinity for its target.
  • Mixtures of complexing agents may be exposed to the target N-terminal amino acid moiety and then subjected to crosslinking such that a covalent linkage is formed between the relevant complexing agents and target amino acid moiety.
  • Such modifications should not substantially interfere with PCR amplification of modified nucleotide-based complexing agents.
  • Improving the properties of the complexing agent includes selection of a suitable group of agents; the group of agents may then be modified and/or partitioned to select for improved affinity and selectivity. Modification may also be made to the complexing agent to limit the non-specific binding of the agent.
  • the complexing agent may comprise a detectable label.
  • the detectable label may be a radiolabel, luminescent, chemiluminescent, electrochemical, colorimetric, or colloidal gold label.
  • Luminescent labels include fluorophores and quantum dots.
  • Complexing agents specific for different N-terminal amino acid moieties may be labeled with a different label such that each amino acid known or expected to be present in the polypeptide may be detected and/or distinguished. When multiple complexing agents specific for different N-terminal amino acid moieties are used, for example, a set of complexing agents, each complexing agent may be specific for an N-terminal amino acid moiety and may have a detectable label that is distinguishable from the other detectable labels of the set.
  • the wavelength emissions of each label corresponding to each N-terminal amino acid specific complexing agent may be distinct from each other.
  • the complexing agents may optionally have two or more of the same or different labels attached to them.
  • fluorophores useful in the methods herein described may be commercially obtainable, such as TAMRA, Hoechst dye, fluorescein, rhodamine, Texas Red, 40 nm fluorescent beads sold by Molecular Probes TransFluro Spheres (Q- dotsTM), or any of the Cy or Sypro dyes.
  • Fluorophore labels may be attached to a complexing agent by standard methodologies.
  • Fluorescent molecules may include quantum dots, such as QDotsTM (Invitrogen).
  • the complexing agents used in the sequencing or physical characterization methods herein described may be labeled or tagged.
  • the label may be an optically detectable species.
  • the label may be a moiety capable of luminescence, light scattering, or wavelength shifting, and may be directly or indirectly associated with binding of the complexing agent with the N-terminal amino acid.
  • the methods of the present invention may involve the detection of a terminal amino acid having a detectable label, then removing the terminal amino acid using a removing agent specific for the terminal amino acid.
  • the methods herein described may utilize an N-terminal amino acid removing agent having specificity for one or more N-terminal amino acids of a peptide or subset of N-terminal amino acid moieties.
  • the methods herein disclosed may provide for high throughput protein detection and characterization technique, using single peptide molecule detection techniques. Single molecule detection may be achieved, for example, with a variety of different optical means using luminescence or fluorescent molecules.
  • the label of the complexing agent used in the methods herein described may be detected by methods known in the art.
  • the label of the complexing agent may be detected with a suitably configured optical microscope.
  • the emitted radiation may be directed by the microscope onto detection elements, such as a charged-coupled device (CCD) camera.
  • the microscope may have unique optical filters each connected to a CCD camera, each optical filter corresponding to one label from a plurality of complexing agents so that each label may be detected and optionally recorded by the CCD camera.
  • the CCD camera may then convert the emitted radiation into an electrical signal that may be read by a computer.
  • Luminescence of the label detected by the microscope CCD may be fluorescence.
  • TIRF total internal reflectance fluorescence
  • Dye labels may be laser- excited using confocal, evanescent-wave or other geometries for low background detection of the individual labels. Radiolabels may be detected using standard radiometric techniques.
  • the complexing agent may comprise a metal colloidal particle.
  • the metal colloidal particles preferably provide for strong absorption without substantial loss of the complexing agents' affinity for their binding or bindable counterpart. Depending on the circumstances, metal colloidal particles may be detected, for example, by direct visual examination, by microscopic techniques or spectrophotometric techniques.
  • the methods herein described may additionally comprise the step of contacting a secondary factor to the complexing agent that is bound to the N-terminal amino acid moiety to boost or modulate the signal derived from the binding of the complexing agent to the N-terminal amino acid moiety of the polypeptide, which may increase the sensitivity of the method.
  • This secondary factor may be a second complexing agent as herein described such as a labeled aptamer, antibody, protein or compound that recognizes the complexing agent, the complex, or a tag that may be bound to the complexing agent.
  • the secondary factor may provide for fluorescence resonance energy transfer (FRET) or may quench the fluorescence.
  • FRET fluorescence resonance energy transfer
  • Such secondary factors include, for example, green or red fluorescence protein (GFP and RFP, respectively) and their derivatives.
  • the polypeptide may be digested before identifying the N-terminal amino acid or before determining at least a portion of the polypeptide sequence.
  • Digested polypeptide includes polypeptide cleaved at specific amino acids into smaller peptides ("digested peptide") that may then be sequenced by the methods herein described.
  • the method herein described includes forming a complex between an N-terminal amino acid moiety of a corresponding digested peptide from the polypeptide of interest and a complexing agent, as described above. Reconstruction of the polypeptide sequence may be provided using the determined sequences of the one or more of the digested polypeptides.
  • Specific cleavage of polypeptides into digested peptides may be achieved by chemical or enzymatic methods.
  • the mixture of peptides obtained by specific chemical or enzymatic cleavage may be identified and/or sequenced or they may be separated prior to identification and/or sequencing, for example, by chromatography or SDS-PAGE.
  • Non-limiting methods of specifically cleaving polypeptide chains are provided in Table 1. Table 1.
  • the methods herein described may be useful for polypeptides of either known or unknown structure.
  • a combination of cleavage agents may be designed to verify or confirm the putative structure or sequence.
  • the amino acid sequences of digested segments of the original polypeptide may be recombined, for example, by overlapping peptide sequences derived from the polypeptide cleaved with a second enzyme/chemical agent that cleaved the polypeptide chain at different linkages.
  • chymotrypsin cleaved sequences may be overlapped with chymotryptic sequences derived from the same polypeptide.
  • Such overlapping peptide analysis procedures may be integrated into the method herein described.
  • purification or isolation of all or a portion of the sample may be performed, for example, using SDS-gel electrophoresis under reducing conditions. The number of distinct N-terminal amino acids or the total polypeptide content may be determined prior to a separation/purification procedure.
  • the sample containing the polypeptide of interest may be denatured using denaturing agents, such as urea or guanidine hydrochloride, before sequence determination of the individual chains.
  • the sample containing the polypeptide of interest may be treated with reducing agents such as 2-mercaptoethanol or dithiothreitol to separate disulfide bonds or deactivate thiols.
  • the sample containing the polypeptide of interest may be alkylated with iodoacetate to form stable S-carboxymethyl derivatives to prevent cysteine residues from recombining. Identification and/or sequencing may then be performed as heretofore described.
  • a protein may be modified after translation. Specific side chains of the protein may be altered.
  • Polypeptides of interest may be bound to another molecule such as, for example, glycolipids or a glycan moeity. Such polypeptides may be analyzed according to the methods herein described while still attached to the molecule or such other molecule may be cleaved prior to analyzing the polypeptide. The methods herein described may provide for determining information as to whether an amino acid sequence comprising post-translational modifications is present or that a specific modified amino acid is present. For example, an aptamer with specificity for a known post-translational modification may be used.
  • the method for sequencing the polypeptide herein disclosed may include the step of removing one or more of the N-terminal amino acids with an N-terminal amino acid removing reagent.
  • the N-terminal amino acid removing reagent(s) that may be useful according to the methods herein described will depend upon the nature of the N-terminal amino acid of the polypeptide and the sequence or type of structural information desired.
  • Several N-terminal amino acid removing reagents are known in the art for polypeptides. Aminopeptidases are commercially available, for example from reagent suppliers such as Sigma Chemicals (St. Louis, Mo.) and Oxford
  • Chemical agents may include acids, for example, for providing hydrolysis of the N-terminal amino acid moiety in accordance with the methods disclosed herein.
  • aminopeptidases may be developed using mutagenesis of known aminopeptidases to produce aminopeptidases having different N-terminal amino acid removing capabilities.
  • aminopeptidases may be developed using mutagenesis techniques to produce aminopeptidases having specificity for different N-terminal amino acids. Methods for developing such aminopeptidases are provided in the Examples below.
  • N-terminal amino acid removing reagents may be suitable for elucidating the structure of the polypeptide including its sequence according to the method herein described. Combinations of the above-described individual removing agents may be used. For example, chemical removing agents may be used with enzymatic removing agents. Two or more removing agents may be used simultaneously or sequentially on a polypeptide. The specific combination and the circumstances under which such a combination is appropriate will depend upon the nature of the polypeptide and the information desired.
  • the N-terminal amino acid removing agent comprises the agents of the Edman degradation procedure.
  • the Edman degradation procedure comprises forming a derviative of the N-terminal amino acid by reacting a phenyl isothiocyanate (PITC) with the N-terminal amino acid under basic conditions (e.g., n-methylpiperidine/methanol/water) to form a phenylthiocarbamyl derivative (PITC -N-terminal amino acid).
  • PITC phenyl isothiocyanate
  • Trifluoroacetic acid may then be used to cleave off the PITC-N -terminal amino acid as its anilinothialinone derivative (ATZ-amino acid), leaving a new amino terminus.
  • the ATZ amino acid may be removed by extraction with N-butyl chloride and converted to a phenylthiohydantoin derivative (PTH-amino acid) with 25% TF A/water for complementary analysis if desired (for example, using a reverse- phase C-18 column with UV detection at 280nm).
  • PTH-amino acid phenylthiohydantoin derivative
  • Unmodified cysteine residues which may interfere with the Edman degradation procedure as described above, may be modified by known methods and chemistries.
  • Blocked amino termini e.g., an amino terminus that is glycosylated or phosphorylated
  • De-blocking procedures may be performed by known chemistries and methods.
  • the methods herein described may be complemented with other methods of determining the N-terminal amino acids or other methods of determining the sequence of a polypeptide, concurrently or subsequently with the methods herein described.
  • the cleaved terminal amino acid(s) may be concurrently or subsequently characterized using HPLC or mass spectrometric methods.
  • the phenylthiocarbamoyl derivative of the N-terminal amino acid may be liberated and the cyclic phenylthiohydantoin (PTH)-amino acid may be identified by chromatographic methods such as high-pressure liquid chromatography, gas-phase sequenators, mass spectroscopy and/or other procedures.
  • the complementary procedure may be repeated as needed or desired.
  • the methods herein described may be complemented with protein purification procedures such as SDS-polyacrylamide gel electrophoresis. Such procedures may be automated and/or integrated with the methods herein described.
  • the methods described herein may provide a process for determining at least a portion of an amino acid sequence of a polypeptide of interest.
  • the process comprises forming a complex between an N-terminal amino acid moiety of the polypeptide and a complexing agent having specificity for at least one N-terminal amino acid moiety of the polypeptide.
  • complexing agents having specificity for different N-terminal amino acid moieties may be contacted with the polypeptide, for example, the complexing agents may be selected from a set of complexing agents, where each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection. Detecting the label in the complex provides for identifying the N-terminal amino acid of the polypeptide.
  • the complexing agent and/or the N-terminal amino acid moiety may be removed from the polypeptide such that the adjacent amino acid may be identified in order to determine at least a portion of the amino acid sequence of the polypeptide of interest.
  • This cycle may be repeated, for example, 6-10 times in order to obtain enough sequence information for identification of a known peptide sequence, or repeated, for example, 10-16 times for characterization of an unknown sequence. More or less repetitions of this cycle may be used.
  • the above method may be carried out in solution or may be adapted to a surface, solid support, beads, and/or array or microarray.
  • the method of identifying and/or sequencing a polypeptide of interest may be optimized for automation using a solid support or an array.
  • the polypeptide or a digested portion of the polypeptide
  • the surface of the solid support or the array may provide bonding in an orderly fashion.
  • the surface may comprise equally spaced binding sites for one or more of the same or different polypeptides.
  • the binding site may be situated such that the C-terminus of the polypeptide will bind preferentially.
  • the surface may be treated to reduce non-specific binding.
  • the surface may be patterned to facilitate containment of the polypeptide to a region on the surface and/or create a reaction chamber to facilitate the binding of the polypeptide to the surface or to facilitate complex formation/removal or terminal amino acid removal.
  • the polypeptide may be covalently or non-covalently attached to the surface.
  • polypeptides may be sequenced using a solid support or microarray comprising a surface amenable to bonding of the polypeptide.
  • a method for sequencing may comprise bonding the polypeptide to a solid support and forming a complex between the N-terminal amino acid and a labeled complexing agent, and a detection step comprising the detection of the label of the complex.
  • a detection step comprising the detection of the label of the complex.
  • Solid supports useful for binding to polypeptides in the art includes, for example, glass beads, cellulose beads, polystyrene beads, SEPHADEX beads, SEPHAROSE beads, polyacrylamide beads and agarose beads (see, e.g., Ghosh, et al., "Covalent Attachment of Oligonucleotides to Solid Supports," Nucleic Acids Research. 15:(13) 5353-5372 (1987); U.S. Pat. No. 4,992,383 (Farnsworth); both of these references are hereby incorporated by reference herein).
  • the solid support may be silica, silicon, glass such as borosilicate glass, or plastic functionalized to enable covalent or non-covalent coupling.
  • Functionalized surfaces of the solid support may be obtained using conventional silanization methods to incorporate reactive groups, or by thin-film deposition of polymers containing reactive functional moieties.
  • the functional group is chosen to facilitate covalent binding of polypeptides, preferably through the C-terminus.
  • the surface is otherwise passive to the absorption of complexing agents that bind the polypeptides.
  • the functional group is an amine that terminates a surface-bound linker, to which the C-terminal amino acid may covalently couple, for example, in the presence of imidazole and a carbodiimide (e.g., EDAC).
  • the surface may additionally be patterned, such as in a microarray.
  • patterning such as patterns of hydrophilic patches separated by hydrophobic regions, or patterns of surface depressions (nanowells) may be used, which may be obtained by replication from a master generated by standard lithographic techniques.
  • the polypeptide may be attached at both the C-terminus and N-terminus followed by cleavage of at least one peptide bond of the bound polypeptide to provide at least one accessible N-terminal amino acid.
  • the surface may be contacted with one or more complexing agents, the complexing agents selected from a set of complexing agents, wherein each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection, under conditions that favor complexing agent binding with the N-terminal amino acid moiety of the polypeptide(s).
  • the surface onto which the polypeptide is deposited may be washed before and after a complexing agent is bound to the polypeptide.
  • the identity of the N-terminal amino acid for the polypeptide at a specific location on the surface may be determined as described above.
  • a compilation of the amino acids corresponding to a particular location on the surface may be generated and/or stored digitally for manipulation.
  • the portion of determined amino acid sequence corresponding to the polypeptide may comprise gaps (e.g., corresponding to amino acids not identified). Gaps may be included in a compiled sequence for a polypeptide, for example, as a random amino acid or an amino acid of a particular functionality, acidicity etc.
  • N-terminal amino acid moiety may be sequentially removed and the above sequence of steps repeated to provide at least a portion of the sequence of a polypeptide.
  • the sequencing method comprises the step of identifying sequential N-terminal amino acids of a polypeptide, such as a protein, to provide for identification of the protein.
  • the sequencing method may optionally comprise using an N-terminal amino acid removing reagent to remove the N-terminal amino acid from the polypeptide.
  • a microarray-based polypeptide sequencing procedure is described below for a polypeptide of interest.
  • Amino acid-at-a-time sequencing of polypeptide is accomplished by the repeated sequential identification and removal of the N-terminal amino acid of a polypeptide whose sequence of amino acids is to be determined.
  • the polypeptide of interest may be purified or present in a mixture of polypeptides and/or may be digested.
  • the polypeptide or digested may be held fixed, at the C-terminal end or other position distal to the N-terminal amino acid to be removed, at a specific location on the microarray support or surface.
  • the process may be multiplexed, such that complexing and N-terminal amino acid moiety removal may be performed on the array surface.
  • a plurality of spatially separated locations (NxM array) with one polypeptide or its digested per location may be provided on the surface with each location containing the polypeptide or a single digested peptide from one polypeptide. Subsequent parallel processing and readout of the surface- bound polypeptides may greatly improve the effective sequencing rate.
  • the use of a microarray may provide for mapping of the polypeptides on the surface and the recording of the sequence of amino acids from the polypeptide at specific locations on the surface. The quantity of an identified polypeptide on a surface may be determined to provide absolute quantification of the polypeptide in a sample.
  • the steps for detecting and identifying the N-terminal amino acids and/or sequencing at least a portion of polypeptides spatially arranged on a microarray may be performed as follows.
  • One or more complexing agents such as a collection of aptamers, each aptamer specific to a N-terminal amino acid moiety, and each aptamer having a unique label, preferably a dye or group of dyes or dye-dye pairs (Fluorescent Resonant Energy Transfer (FRET)) that yield a distinguishable detectable measurement, e.g., spectral or temporal luminescence properties, may be contacted with the surface bound polypeptide.
  • FRET fluorescent Resonant Energy Transfer
  • the concentration of each type of aptamer may be adjusted to be approximately 10-100 times the value of the known or estimated equilibrium binding constant for its N-terminal amino acid moiety ligand(s) such as to provide a sufficient equilibrium concentration for complex formation.
  • the substrate microarray containing the surface-bound polypeptides may be incubated after contact with the collection of aptamers for a sufficient time to allow equilibrium to be reached. This surface may then be washed to substantially reduce or eliminate non- bound aptamers and the weak, non-specifically-bound or surface-bound aptamers. The wash time should be short enough so that N-terminal amino acid moiety-bound aptamers are not substantially removed. Excess aptamer may be collected and recycled.
  • the surface may be dried to further immobilize the specifically-bound aptamers at the location of their respective N-terminal amino acid ligand.
  • the substrate may then be scanned under appropriate conditions, and the labeled aptamers detected and/or recorded as a function of their specific locations on the surface. By detection and discrimination of the complexes, a map of the identity and location of N-terminal amino acid moieties on the surface may be obtained and optionally archived.
  • the sequence of a polypeptide of interest or a portion thereof may be compared with some or all other known polypeptide sequences. Comparison may be to identify the polypeptide and/or ascertain similarities, evolutionary history, or identify pathogenic origin or similarity and the like. Comparison of a relationship of a sequence obtained by the methods herein described with that of some or all known polypeptide sequences may be performed using a computer containing or having access to a library of sequences. Comparison of the identified sequence may include database searching techniques.
  • the methods herein disclosed may include reconstructing protein sequences obtained from sequenced N-terminal amino acids identified from a sample so as to identify protein in a sample.
  • the methods disclosed may be useful for determining if an organism has been modified and/or the type of modification made (e.g., increasing lethality) has an impact on its virulence. This may be accomplished by taking a single sample and dividing it into portions and then digesting each portion with a different protease. The peptides generated will then have overlapping sequence that will make it possible to reconstruct the original protein before its digestion using standard protein reconstruction techniques.
  • tryptic peptides combined with peptides from a GIu-C digest, which cleaves at aspartic and glutamic acids may be used to reconstruct the sequence from a B. anthracis spore coat protein.
  • Database calculations may be preformed to determine how many proteases would be necessary to reconstruct large number of proteins, for example, from B. anthracis, given that a mixture of digested peptides from a large number of different proteins will be present.
  • Sequences generated with missing amino acids may still be searchable and/or provide sufficient information for identification by database searching techniques. Missing amino acids in the sequence result in different database searching problems, for example, gaps in the sequence, gaining a "phantom" missing amino acid, or detecting the same amino acid twice. These issues may be compounded by the possibility of amino acids without aptamers or having aptamers that are not specific to one amino acid.
  • the most important amino acids may be assumed to be those that are the most common in proteins, such as leucine, serine, alanine, glycine, etc., with the exception that lysine and arginine will be the least useful since they typically are found at the C-terminus, which will likely often not be sequenced.
  • the effect of missing multiple aptamers may be determined. Because of the computationally high number of combinations of multiple missing amino acid specific aptamers (ASAs) that would be generated, only the worst and best case scenarios may be employed, for example, that of losing the most informative amino acids and that of losing the least informative amino acids, respectively.
  • the methods disclosed herein may be used to directly sequence single peptide molecules as an assessment tool of virulence, organism viability, or infection (evaluating either plasma, serum, urine, or any other tissue or fluid that may contain peptide molecules). Information regarding bioengineered resistance genes or other genetic manipulation of organisms may be obtained by the methods herein disclosed. [0076] Referring now to the Figures, various illustrative embodiments will be described. A general overview of the methods described herein is depicted in FIG. 1. Thus, a sample containing a polypeptide of interest may be extracted and digested as depicted in steps A and B using standard protocols.
  • the resulting peptides from the sample may be bound to a surface such as a microscope slide or microarray format, as depicted in step C.
  • the peptides may be bound to the surface such that the N-terminal amino acid is available for interaction with agents or for chemical modification.
  • the N-terminal amino acid may be derivatized for an N-terminal amino acid removal procedure or the derivative may be formed before or after complex formation with the complexing agent.
  • the bound peptides of the sample may be contacted with an N-terminal amino acid specific complexing agent as depicted in step D of FIG. 1.
  • Detection of the label corresponding to the complexing agent of the N-terminal amino acid- complexing agent complex provides for determining the identity of the N-terminal amino acid. Such detection may be at the single molecule detection level. Suitable detection methods may include total internal reflectance fluorescence microscopy (TIRF). Association of the N-terminal amino acid of a specific peptide may be correlated with a corresponding position on the slide or microarray. This information may be stored digitally such that it is readily accessible for searching, additional manipulation, or archiving as depicted in step E.
  • TIRF total internal reflectance fluorescence microscopy
  • the complexing agent or set of complexing agents may be removed or washed off the slide.
  • the sample may be subjected to an N-terminal amino acid removal procedure to remove the N-terminal amino acid of the peptide.
  • the N-terminal amino acid may be removed, as depicted in step F, and the peptides may again be contacted with the complexing agent or set of complexing agents with subsequent identification of the amino acid.
  • the complexing agent(s) may be the same as those used previously or they may be different.
  • Detection of the detectable label of the complex agent-N- terminal amino acid complex provides for determining the identity of the next sequential N-terminal amino acid.
  • the process may be repeated a number of times such as to obtain at least a portion of the sequence of the peptide.
  • the portion of the sequence of the peptide may be used for identification of the peptide and/or the protein from which the peptide was derived.
  • FIG. 2 an example of a peptide sequencing assay using an aptamer complexing agent for complexing with a surface bound peptide derivative for providing a detectable complex is depicted.
  • a surface bound peptide having an N-terminal amino acid is contacted with phenylisothiocyanate compound as depicted in step 1.
  • the derivatized amino acid is contacted with labeled aptamer as depicted in step 2.
  • Aptamer-amino acid complex is detected using imaging techniques as depicted in step 3.
  • N-terminal amino acid removal for example, using an Edman degradation procedure removes the N-terminal amino acid of the peptide as depicted in step 4.
  • Steps 1-3 may be repeated or steps 1-4 may be repeated to provide at least a portion of the sequence of the surface bound peptide.
  • the lack of specific sequence information corresponding to that specific N-terminal amino acid of the peptide may be indicated, for example, as a random amino acid in the sequence.
  • FIG. 3 an example of a peptide sequencing assay using an N-terminal amino acid removing agent specific for one or more N-terminal amino acids is depicted.
  • peptides are produced from a protein sample using a tryptic protein, as depicted in step 1.
  • the tryptic peptides having N-terminal amino acids are bound to the surface of a slide by covaltently attaching the peptides using isothiocyanate as depicted in step 2.
  • the N-terminal amino acid of each peptide is labeled using biotin as depicted in step 3. Labeled amino acids are detected using imaging techniques as depicted in step 4.
  • N-terminal amino acid removal for example, using a specific N-terminal amino acid aminopeptidase removes the respective N-terminal amino acid of the peptide as depicted in step 5.
  • Steps 4-5 may be repeated for each specific N-terminal amino acid aminopeptidase to provide at least a portion of the sequence of the surface bound peptide.
  • the steps of 3-5 may be repeated to obtain the identity of the subsequent N-terminal amino acids. The steps are repeated until the desired number of amino acids in a sequence are identified.
  • the methods herein described may be complemented by determining the total amino acid composition of the peptide.
  • the polypeptide sample may be hydrolyzed into its constituent amino acids by heating it in 6 N HCl at 110 0 C for 24 hours. Amino acids in hydrolysates may be separated and characterized, for example by ion-exchange chromatography on columns of sulfonated polystyrene or mass spectrometry.
  • Quantification of the peptide may be obtained by reaction with ninhydrin or fluorescamine and determination of the optical absorbance of the solution.
  • the methods herein described may provide for sequencing or characterizing a single polypeptide or a quantitative or qualitative amount of polypeptide, for example at the single molecule level or below sub-femtomolar range.
  • the quantity of that portion of that polypeptide of interest (or the corresponding polypeptide of origin, e.g., protein) may be quantified.
  • the amount of quantified polypeptide may be ratiometrically compared with the total polypeptide content of the sample to provide an absolute quantification of the polypeptide of interest.
  • polypeptides of interest from a sample may be randomly positioned on a surface such as a microchip or microarray.
  • the surface of the microchip or microarray may be scanned to determine the quantity of polypeptide of interest on the surface and quantified as a proportion of the total polypeptide content of the sample.
  • Absolute quantitation of peptides using the methods herein disclosed may be obtained through counting the number of times each unique peptide is identified. By correlating the number of times a peptide is identified with the amount of peptides loaded (or the number of cells used) an absolute copy number of that molecule in the sample or cell may be determined.
  • tryptically digested proteins from a sample may provide multiple peptides representing each protein in the assay.
  • the quantities of the peptides may be averaged and statistics used to minimize sampling error and/or more accurately determine the quantity of the protein.
  • the multiple peptides of the digested cross validate each other's determination of the protein level with any discrepancies signaling a potentially relevant event, such as a post-translated modification (PTM) or alternative splicing, for example.
  • PTM post-translated modification
  • the methods herein disclosed may provide profiling of the entire protein content of the organism and the specific ways in which the organism has been modified may be identified without a priori knowledge of what those genetic changes may be.
  • Structural information obtained from the methods herein described includes, for example, information or suggestion regarding attributes of the primary structure of the polypeptide which may be derived from the interaction of the complexing agent with the N-terminal amino acid of the polypeptide, the amino acid composition of the polypeptide, the order in which the amino acids are linked and amino acid post-translational modification.
  • ASA N-terminal Amino Acid Specific Aptamer Screening
  • selection of complexing agents suitable for use in the methods herein described may be derived by creating an affinity column with an amino acid, derivatized amino acid or post-translationally modified amino acid attached to it. Mixtures of random complexing agents may be screened for affinity and selectivity using the affinity column, and then isolated and/or optionally amplified to provide complexing agents that bind with the amino acid.
  • aptamer selection may be carried out, e.g., following the methods of Gold, et al. (U.S. Pat. No. 5,270,163), which describes the "SELEX" (Systematic Evolution of Ligands by Exponential Enrichment) method.
  • the SELEX process may be used to screen for amino acid and/or PITC derivatized amino acid selectivity, for example.
  • the ligand of interest in this case an individual amino acid(s) is conjugated to a support matrix.
  • the support matrix may be a chromatography media with free amino groups so that the amino acids can be conjugated through their carboxyl groups.
  • the PITC reagent is reacted with the amino acids on a column.
  • An aptamer library may then be synthesized with a pair of known primers surrounding a region of random nucleotides.
  • aptamers typically are used as the starting pool for selection, which roughly equals 100s of picomoles of aptamers.
  • the aptamer library may then run over the column so that mainly those molecules that bind either the epitope or the support matrix remain. These aptamers may then be affinity eluted with the appropriate amino acid or with high salt.
  • This first round aptamer pool may then be amplified by PCR. This process may be repeated from 5-20 times to isolate highly specific aptamers.
  • the above screening protocol may be adjusted with the following modifications and additions. First, in later rounds of PCR random mutagenic PCR may be used to provide possible nucleotide permutations to find the most specific aptamer structure.
  • counter- selection on support matrix derivatized with PITC may be used to remove aptamers that recognize either the support material or only the PITC motif.
  • Nontraditional nucleotides, such as nucleotides derived with amines or fluorine, in aptamer selection or RNA aptamers may be used.
  • Selection media for the SELEX process may be CarboxylLink media (Pierce, Rockford, IL). Selection column size may be approximately 100 ⁇ L of media packed into miniature homemade columns. Fmoc derived amino acids (Bachem, King of Prussia, PA) may be coupled to the media through reactions with the l-Ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) in 100 mM MES Buffer (100 mM 2-(N-morpholino)ethanesulfonic acid, 0.9% NaCl) (Pierce, Rockford, IL).
  • EDC l-Ethyl-3-(3-dimethylaminopropyl)carbodiimide
  • Fmoc blocked amino acids may stop the reaction from forming polymer chains of the amino acids on the media.
  • the Fmoc group may then be released by washing the column with base.
  • the column may then be washed with methanol/10 mM HEPES pH 8.3 (methanol/HEPES) to prepare for PITC coupling.
  • methanol/10 mM HEPES pH 8.3 methanol/HEPES
  • Each column may then be equilibrated with 1% PITC in methanol/HEPES and allowed to stand at RT for 30 minutes. To drive the reaction to 100% completion, the PITC incubation may be repeated three times.
  • a counter selection column may be generated by simply conjugating the media with PITC reagent.
  • One nmol of synthesized aptamer library may be dissolved in 300 ⁇ l of Aptamer Selection Buffer (AS buffer), which consists of 100 mM HEPES pH 7.0, 250 mM NaCl, 5 mM MgCl.
  • AS buffer Aptamer Selection Buffer
  • the aptamer may then be heated to 95° C for 10 minutes and then snap frozen on ice to facilitate folding.
  • the column may then be washed with 10 column volumes of AS Buffer to remove unbound ASA's.
  • ASA's may be eluted with 5 mM of the appropriate amino acid or AA-PITC in AS Buffer.
  • 1 M salt in AS Buffer may be used to elute the aptamers, though more rounds of selection and counter selection may likely be necessary.
  • the elution buffer may be heated to 95° C directly before running onto the column to cause aptamers with very low Kd to dissociate from the column.
  • the elution buffer may then be allowed to cool on column so that those aptamers that recognize the column rebind.
  • the eluted aptamers may then be precipitated with ethanol and then amplified by PCR. Two rounds of PCR may be performed for every round of aptamer selection.
  • the first round of PCR may be just amplification of the eluted aptamers in selection rounds 1-2 and then mutagenic PCR in all later rounds.
  • the second round of PCR may use a 10 fold excess of the 3' primer (100 pmol versus 10 pmol of the 5' primer) to preferentially generate the one strand and using a small amount of the first round PCR as starting material.
  • counter selection columns may be used to further deplete those aptamers that show specificity to the media and to remove those aptamers that only recognize the PITC group.
  • aptamers that possess a high degree of amino acid specificity
  • mixtures of the five most structurally similar amino acids to that amino acid which is being selected may be washed over the column before elution, under conditions identical to those described for elution.
  • Isolation of individual aptamers may be accomplished by cloning the ASA's from rounds 10 and 15 of selection into plasmids. The plasmids may then be transformed into E. coli. The transformed E. coli may be selectively plated and individual colonies may be isolated.
  • the specificity of the aptamers may be tested against other amino acids.
  • individual amino acids and AA-PITCs may be conjugated to glass slides and then fluorescently labeled aptamer isolates may be incubated on the slide to determine binding strength.
  • Derivatized glass microscope slides covered with primary amines may be sectioned off into a series of individual wells using silicon mats (Grace Bio-Labs, Bend, OR).
  • Individual Fmoc labeled amino acids may be conjugated in each well by incubating with EDC in MES Buffer at RT for 24 hours. The Fmoc group may then be removed with base.
  • the slides may then be washed in methanol/HEPES buffer.
  • the amino acids may be derivatized with PITC by incubating each well with 1% PITC in methanol/HEPES buffer for 1 hour at 37° C.
  • the wells may then be washed with methanol/HEPES and AS buffers.
  • Plasmids isolated from individual E. coli colonies may then be used to PCR individual APs for testing.
  • the 3' primer may be present in 10 fold excess and may be conjugated to a Cy3 molecule.
  • the products may cleaned using PCR Cleanup Kit (Qiagen, Valencia, CA). Cy3 labeled PCR products may then be resuspended in AS Buffer and incubated on the slide in each well.
  • the amount of fluorescence on the slide may be assayed with a Typhoon Laser Scanner (Amersham, Piscataway, NJ) set to detect Cy3 molecules. Aptamers that show the greatest Cy 3 signal intensity in the well of the appropriate amino acid compared to the other 19 amino acids may be used as aptamers for single molecule sequencing. QuantumDots as detectable labels
  • Attachment of quantum dots to the amino-acid specific aptamers and optimization of the Edman degradation solution conditions may provide for maximization of PITC reactions and aptamer binding. Such optimization may provide attomole or zeptomole detection capability.
  • QDotsTM Invitrogen
  • Typhoon Scanner Hyward, CA
  • the methods herein disclosed may provide for the use of less than 20 uniquely specific amino acid complexing agents (e.g., aptamers) for peptide identifications. Probability calculations based on simple combinatorial numbers of recognizable amino acids, or groups of amino acids, as a function of the amount of sequence information, may be determined using standard statistical methods. The twenty amino acids may subdivided into a limited number of distinguishable pools based on a predicted aptamer availability. For example, four aptamers specific to leucine, serine, isoleucine, and alanine would divide the 20 amino acids up into five pools with the fifth pool containing all unrecognizable amino acids.
  • aptamers specific to leucine, serine, isoleucine, and alanine
  • aptamers sequenced over a continuous stretch of five amino acids would provide 3 5 (243) distinct combinations.
  • the E. coli proteome (80,000), human proteome (1,200,000), and human genome (600 million)
  • a limited number of aptamers of less than 20 is needed for the identification of a peptide from a specific, finite database.
  • the theoretical size of a database may be calculated by dividing the average size of a protein, (e.g., 20 KDa for E. coli), by the average peptide size, 1000 Da, and multiplying that by the number of proteins in an existing database, for example, ⁇ 4000 proteins.
  • a protein e.g. 20 KDa for E. coli
  • the average peptide size 1000 Da
  • multiplying that by the number of proteins in an existing database for example, ⁇ 4000 proteins.
  • Table 2 the number of theoretically distinguishable peptides is highlighted by the size of the peptide database that could be searched assuming a 100 fold excess of distinguishable peptides to account for false positives.
  • the numbers represent the E. coli proteome, the human proteome, and the human genome.
  • the data in Table 2 is based on a 100 fold increase in selectivity for recognizing unique peptides over the total number of entries in the database, however, all databases peptides are not unique and masking some amino acids so that they are indistinguishable will likely decrease the number of unique peptides identifiable.
  • the tryptic peptide sequences in an E. coli proteome database may be converted into strings that represent amino acids with corresponding aptamers and amino acids with no corresponding aptamers.
  • the sample sequence ILPTEQSNAR may be transformed into ILuuuuSuAu ("u " for unknown amino acid) and then queried against the converted database.
  • the peptides may be limited in length, for example, to simulate the database queries obtained from sequencing a limited number of amino acids.
  • two different simulations were run, one that required there to be a number of amino acids at least as long as the sequence read (e.g. Table 3, row 1), and the other where the peptides were at least five amino acids long with "u " for unknown amino acids concatenated to the end to bring it up to the sequence read length (e.g., Table 3, row 2).
  • the results from several simulation runs are listed in Table 3. The results from these simulations are consistent with probability models. More complex simulations to accurately access the aptamer/sequence read length combination needed for any given database may be performed.
  • aptamers for all twenty amino acids may not be necessary to identify peptides from tryptic peptide databases. For example, even in the case where there are four aptamers, which provide 5 amino acid pools, and the sequence for 10 positions is read, 45% of the E. coli tryptic peptides (greater than 10 units long) are unique (Table 3, row 3). Thus, for the average protein containing 20 peptides there would be less than a 0.001 chance that it would be absent a unique peptide, or that in the entire database, only four proteins would be without a unique peptide that the methods herein disclosed could quantitate.
  • identifying a component in a sample with a member of a database where a determined sequence is missing an amino acid may be provided.
  • An amino acid may be missing because of lack of detection or having the amino acid remain through more than one cycle as described herein, or because of miscleavage or lack of PITC formation.
  • Different probabilities for those events occurring only once per peptide may be used to estimate.
  • the estimated rates for incorporation of the PITC label onto an amino acid and cleavage is high and typically greater than 95%. Thus, probabilities in the range of 90% or greater success may be used.
  • the chance of aptamers not binding to an epitope may be determined, for example, by empirical methods.
  • the effect of a limited (less than 20) aptamer set may be estimated by ranking the top ten most informant amino acids whose absence may have the greatest negative impact on the numbers of identifiable peptides. These aptamers may then be developed and used. For example, if it is found that lack of a leucine aptamer limits the number of unique peptides in the B. anthracis proteome to 10,000, but the lack of cysteine aptamer makes that limit 30,000, developing the leucine aptamer may be deemed of greater importance.
  • Software may be used to perform direct sequence comparison or sequence alignments.
  • Aligning the sequences may not be necessary using the method herein disclosed because only a comparison to a finite number of known sequences, where the starting point for comparison will almost always be the first amino acid in the sequences, would generally be performed.
  • the most current databases may be downloaded from external sources, for example, for B. anthracis or for any combination of bacteria proteomes and genomes.
  • a single aptamer that recognizes several different N-terminal amino acid moieties may be used. For example, one aptamer that recognizes two N-terminal amino acid moieties, or 5 aptamers that each recognize 4 N-terminal amino acid moieties may be used in the methods herein described.
  • the five aptamers with four N-terminal amino acid moiety members roughly correlates with probability based calculations suggesting that approximately four aptamers and 10 N-terminal amino acids sequenced may be needed to identify an average peptide from a tryptic digested database, for example, for a known organism such as B. anthracis.
  • the minimal information needed to accurately identify a peptide from partial sequencing using simulation and computer modeling may be determined theoretically and/or statistically.
  • Initial bioinformatic analysis may provide simulations of database searching using limited and partial amino acid sequences for determining the optimal amino acids to develop detectable aptamers.
  • Peptides other than those in a protein database that may be present in a sample may effect the false positive rate. Therefore, simulations may be used to determine expected false positive and false negative rates under "less-than ideal" conditions, as may be expected in real world applications of the methods herein disclosed.
  • the methods herein disclosed may reduce false positives typically associated with current peptide sequencing and protein determination techniques (for example, the influence of having peptides in the sample not present in the database). For example, using software simulation tools, randomly generated tryptic peptides and tryptic peptides from C. elegans (random organism to determine false positive rates) that have no exact match in a test database may be searched against the test database. By varying the lengths of the query peptide sequences, it may be determinable how often false positives occur and at what peptide length these spurious hits may be eliminated using standard bioinformatics techniques. Thus, by determining false positive levels and analyzing the minimal number of sequenced amino acids necessary from a peptide it may be possible to uniquely identify the peptide from a group of homologs or other related biological organisms.
  • Example 2 N-terminal Amino Acid Aminopeptidase-based Sequencing Mutant Aminopeptidase Generation.
  • Nucleic acid molecules encoding aminopeptidases from Escherichia coli may be cloned into the Invitrogen Gateway system for analysis and expression.
  • Such peptidases include leucine aminopeptidase (PepA), methionine aminopeptidase (MAP), and the aminopeptidase ypdE (ypdE).
  • the peptidases may initially be expressed in E. coli with 6xHis Tags, added via the gateway system, to facilitate purification.
  • the aminopeptidase activity may be verified with L-amino-acids-4- nitroanilide and different rate constants for the various L-amino acids may be determined.
  • a fluorescent version of the amino acid cleavage reagent may be made, for example, using Rhodamine 1 10, which becomes non fluorescent when amino acids are attached to its two free amines.
  • the wild-type peptidases' response to this compound may be determined.
  • Mutagenesis of the aminopeptidases may be performed via both site directed mutagenesis and random mutagenesis in combination and separately. With site directed mutagenesis, PCR primers are generared that cover the reaction sites of the proteases. These primers may be used to mutate specific amino acids in the active site to different amino acids. These mutants may be referred to as SDAPMl.
  • Random mutagenesis may be performed using, for example the Mutazyme random mutagenesis kit (Stratagene) or the Mn +2 PCR method (Leung, D. W., et al., Technique, (1989) 1 :11-15). Mutation rates of 1 per 10000 bases may be used so that most mutants contain 1 - 2 mutations per PCR product. These mutants may be referred to as RMAP. These two pools of mutants may be mixed for the screening and selection process.
  • Mutazyme random mutagenesis kit Stratagene
  • Mn +2 PCR method Leung, D. W., et al., Technique, (1989) 1 :11-15. Mutation rates of 1 per 10000 bases may be used so that most mutants contain 1 - 2 mutations per PCR product. These mutants may be referred to as RMAP. These two pools of mutants may be mixed for the screening and selection process.
  • the in vitro compartmentalization methodology may be used for screening and selection of mutated sequences having altered peptidase activity.
  • a water/oil emulsion is generated using mineral oil, Span-80, Tween-80, and an in vitro transcription & translation kit and an appropriate Rhodamine 110 conjugate.
  • the mutant pool is then dispersed in the approximately 3 um microdroplets to limit the number of microdroplets that contain more than 1 mutant.
  • the emulsion is then heated at 37°C to synthesize the mutant aminopeptidases in each compartment and allow the peptidase to digest the screening reagent.
  • Microdroplets that contained fluorescence may be isolated by flow cytometry in a 96 well plates.
  • the isolated mutants may then be tested again for the ability to digest the target molecule and for general specificity to other amino acids. Mutants having acceptable characteristics, those that digest the target with the highest kinetics while being the most specific to the target molecule, may be used for the next round of random mutagenesis. Each round of selection and screening should produce mutant aminopeptidases with higher kinetics and specificity. Generally this process should take 3 - 7 rounds of mutant generation and screening. [00110] Initially, a mutant that recognizes only biotinylated N-terminal amino acid(s) and a broad specificity to most or all amino acids should be isolated. This mutant would then be used to generate other mutants that have high specificity for individual or small groups of biotinylated amino acids.
  • the aminopeptidase based sequencing follows a logical progression of five experimental steps ( Figure 3); some of these steps are repeated.
  • the first step is to prepare the sample and conjugate the peptides to a microscope slide.
  • the second step involves the biotinylation of the N-terminus of each conjugated peptide. This may require that cysteine and lysine side chains be blocked to prevent biotin labeling.
  • the third step detects the presence of the biotin with a fluorescently labeled protein such as streptavidin or an antibody.
  • the fourth step adds an aminopeptidase specific for a biotinylated amino acid(s). Steps 3 and 4 are repeated for each aminopeptidase with a final biotin detection step at the end.
  • Steps 2 through 4 are repeated between about 4 to 15 times.
  • the fifth step analyzes the sequential image files to determine which amino acid(s) is at or most probably at each position of the sequenced peptides. This process allows for both the identification (sequencing) and quantitation (counting the number times each peptide was found on the slide) of each peptide.
  • a protein sample is prepared for analysis by proteolytic digestion followed by desalting.
  • Tissue extracted from an adult rat is homogenized on ice for 15 in Lysis Buffer (25 mM Tris pH 8.0, 5 mM DTT, 8 M Urea, 100 mM NaCl).
  • Lysis Buffer 25 mM Tris pH 8.0, 5 mM DTT, 8 M Urea, 100 mM NaCl.
  • the lysate is centrifuged at 15,000 x g for 10 minutes.
  • the sample is then heated to 37° C for 1 hour to reduce disulfide bonds.
  • the sample is diluted 7: 1 with Digestion Buffer (25 mM Tris pH 7.6). Sequencing grade trypsin is added to the sample at a ratio of 50: 1 for sample to trypsin.
  • the reaction is then incubated overnight at 37° C.
  • the next morning the sample is desalted with a SepPak
  • the sample is then lyophilized to a dry powder and resuspended in 80% pyridine.
  • a slide is coated with phenyl- biisothiocyanate such that one of the isothiocyanate groups is covalently linked to the slide. This group will react with both lysines and the N-termini of the peptides.
  • the sample is then placed on the slide and allowed to react 2 hours at room temperature.
  • the N-terminal amino acid is then cleaved from the peptides by the addition of 95% TFA. This reaction will leave the peptides bound by the isothiocyanate attached to a lysine. Non-lysine containing peptides will not be bound.
  • reaction chemistries can be used to attach the peptides to the slide such as a methyl methanethiosulfonate group to react with sulhydrides or p-hydroxyphenylglyoxal group to react with arginines.
  • the slide is then washed in 0.1 M ethanolamine 25 mM Tris pH 8.0 to react with the remaining isothiocyanate groups.
  • the slide should then be washed three times for 1 minute in 50 mM phosphate buffer pH 8.5.
  • the next step in the aminopeptidase-based sequencing is to add biotin to all free amines, which includes the amine termini of the peptides.
  • This reaction is accomplished by activating the carboxyl group of biotin with sulfo-NHS and EDC (1- Ethyl-3-[3-dimethylaminopropyl]carbodiimide Hydrochloride) in 50 mM MES buffer at pH 4.5. The pH is then adjusted to 8.5 and the sulfo-NHS-biotin is added to the peptide conjugated slide. This reaction is incubated at 50 C for 30 minutes. The residual unreacted biotin is then washed away using a buffer of 25 mM Tris pH 8.0.
  • the third step in the aminopeptidase-based sequencing is to incubate the microscope slide with fluorescently labeled streptavidin and acquire digital images of the areas showing where the streptavidin binds. After removal of the excess biotin reagent, a solution of 20 microgram/mL Cy3 labeled streptavidin in S-Buffer (25 mM Tris pH 8.0 10OmM NaCl .1% Tween-20) is incubated on the slide for 15 minutes at 25° C. The excess streptavidin is then washed off the slide with three x 1 minute washes with S-Buffer. A digital image of 1 or more areas of the slide is taken with a motorized fluorescent microscope system (e.g.
  • the fourth step is to digest the N-terminal biotinylated amino acids from the peptides on the slide.
  • the first of the specific biotin aminopeptidases will be incubated on the slide for about 10 minutes at 37° C. The exact time that each individual aminopeptidase mutant is incubated on the slide will depend on the reaction kinetics of that particular mutant. Step three is repeated, and the next in the series of aminopeptidase mutants is then incubated on the slide, followed again by step three.
  • Steps 2 through 4 are repeated between 4 - 20 times, and each cycle identifies the current N-terminal amino acid for each peptide and then removes that amino acid. This process is how the sequence information and identity of each peptide is determined.
  • the exact number of residues that need to be sequenced is dependant on the number and specificity of the aminopeptidases and the complexity of the sample and database searched. Bioinformatics simulations for the aptamer based sequencing method apply here as well and provide an estimate of the number of residues that will need to be sequenced.
  • the fifth step may be automated by computer software and image analysis algorithms but may also be performed manually as follows. Each set of images from the cycle of steps 2 - 4 is a position in the peptide. The first cycle represents the second residue from the N-terminal end of the original tryptic peptide as the first amino acid was removed during the isothiocyanate based attachment step. Each subsequent cycle would represent the amino acid in the peptide as 1 + the current cycle number.
  • So cycle 5 would represent the sixth residue in the peptides.
  • the first fluorescent image shows which peptides currently have an attached biotin and what their relative positions are on the slide.
  • the images before and after each aminopeptidase cycle demonstrate whether the biotin labeled amino acid was cleaved by the current peptidase. For example, if the image before the aminopeptidase addition had the biotin still attached at specific position on the slide and the image after incubation with aminopeptidase shows loss of that fluorescent signal, then it is expected that the amino acid(s) recognized by that aminopeptidase are at the current residue position.
  • the potential amino acids determined by aminopeptidase treat for each cycle are record based on the X and Y position within the image corrected by any slight offset caused by the motorized stage movement with the patterned stationary fluorophore.
  • the sequence can be used to search a database of theoretical tryptic peptides for the best match.
  • the tryptic database is built based on the organism(s) that the user wishes to search by taking all the known proteins in that organism(s) proteome and calculating an in silico tryptic digest of those proteins.
  • Example 3 Identification of Peptides Using Peptide Sequences
  • a bioinformatics simulation was performed to predict the amount of peptide sequence information that may be needed to identify a peptide.
  • the study was designed to address the question in terms of the specific sequence information which all the amino acids may or may not be able to be distinguished by an aptamer based recognition approach.
  • These calculations included analysis of problems that could arise during sequencing that may or may not be specific to the CIMPS methodology.
  • the question of potential false positive rates at both the peptide and organism level were also taken into consideration.
  • the ability for the methods to differentiate closely related species of bacteria was also examined.
  • the code to run the bioinformatics simulations was written in the C programming language using the LabWindows/CVI programming environment. Briefly, the fasta format protein database for a given bacteria or group of organisms was downloaded from NCBI. The database was then digested in silico with trypsin to generate a list of peptides of at least 4 amino acids in length. For multiple organism databases, all peptides that were present multiple times in the data set were reduced to just one instance of that sequence. The sequence was then copied to a adjoining text. To search this peptide database, each peptide in the database was searched against the whole database using various parameters to see if the peptide was unique under a given set of condition.
  • FPE ⁇ O ⁇ Number of Misidentified Peptides Number of Peptides Searched
  • FPunique FPi pep ⁇ The_Number_of_Times_That_Sequence_Is_IDed
  • E. coli The organisms used for the simulations to create the databases of peptides are as follows. For the single organism study, Escherichia coli strain K- 12 was used to create the peptide database and will be referred to as E. coli. For the multiorganism simulations, the following organisms were used:
  • Salmonella typhi Salmonella typhi
  • KIM Vibrio cholerae Yersinia pestis
  • Bacillus anthracis (Ames) Bacillus anthracis (Sterne) Bacillus anthracis (Sterne)
  • Bacillus cereus (ATCC 10987) Bacillus cereus (ZK)
  • results of these methods identify amino acid information content, multiple aptamers, multiple aptamers at different read lengths, aptamers to multiple amino acids, false positive (FPpep) rate for different methods, false positive (FPpop) rates, multiple aptamers at 10 res, multiple aptamers at different read lengths, and other false positive rates.
  • FPpep false positive
  • FPpop false positive

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Urology & Nephrology (AREA)
  • Immunology (AREA)
  • Hematology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Nanotechnology (AREA)
  • Medicinal Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Cell Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Food Science & Technology (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Materials Engineering (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Peptides Or Proteins (AREA)

Abstract

Methods for sequencing a polypeptide and methods for structurally characterizing the polypeptide using labeled N-terminal amino acid specific complexing agents are disclosed. These methods relate to using arrays for identifying specific polypeptides of interest from a sample comprising multiple polypeptides. A method for identifying at least a portion of a polypeptide or protein of interest from a sample using N- terminal amino acid specific complexing agents is disclosed.

Description

CONCURRENT IDENTIFICATION OF MULTITUDES OF POLYPEPTIDES
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application No. 61/118,818, filed December 1, 2008, the entirety of which is hereby incorporated by reference.
BACKGROUND
Technical Field of the Invention
[0001] The invention relates to the use of complexing agents with specificity for N-terminal amino acids or their derivatives in sequencing and structurally characterizing polypeptides. Complexing agents may include, but are not limited to, antibodies, proteins, peptides, peptoids, DNA, RNA, PNA, GNA, TNA, or aptamers. [0002] Chemical protein sequencing has been and continues to be a popular method for determining the primary structure of proteins. See Stolowitz, "Chemical Protein Sequencing and Amino Acid Analysis," Curr. Opin. Biotech. 4:9-13 (1993) and Hunkapiller, M. W., "Contemporary Methodology for the Determination of the Primary Structure of Proteins," Macromol. Seq. and Synthesis, Ed. D. H. Schlesinger, pp.45-58, Alan R. Liss: New York, N.Y. (1988).
[0003] Traditional chemical amino-terminal sequencing includes a degradation step such as Edman degradation and a separate detection step. Edman degradation typically includes a derivatization step, a cleavage step, and a conversion step. For example, in an Edman degradation, the amino terminus of a target polypeptide is derivatized to a thiocarbamoyl, which is cleaved from the polypeptide with an organic acid. The cleaved amino acid may be converted to a phenylthiohydantoin (PTH) form by treatment with an aqueous solution of organic acid. The PTH amino acid may then be detected, for example, by high pressure liquid chromatography (HPLC) or by mass spectrometry (Aebersold, R., et al., "Design, Synthesis, and Characterization of a Protein Sequencing Reagent Yielding Amino Acid Derivatives with Enhanced Detectability by Mass Spectrometry," Protein Science 1:494-503 (1992)). The reagents of the Edman process may be delivered to a target polypeptide which is covalently or non-covalently attached to a solid support. Solid supports used in protein sequencing include polyvinylidene difluoride (PVDF), glass beads or polystyrene beads.
[0004] In another chemical sequencing method, the degradation step includes the thioacetylation of the amino-terminal amino acid for detection by gas chromatography/mass spectrometry (Stolowitz, M L et al., "Thioacetylation Method of Protein Sequencing: Gas Chromatography/Ion Trap Mass Spectrometric Detection of 5-acetoxy-2-Methylthiazoles," J Protein Chem. 11:360-361 (1992)). [0005] In another chemical sequencing method, dabsyl chloride labels the peptide, which is then hydrolyzed with hydrochloric acid. The dabsyl-amino acid may be identified chromatographically.
[0006] Enzymatic digestion of terminal amino acids has been used to sequence polypeptides, for example, using amino-terminal or carboxy-terminal specific exopeptidases. Examples of exopeptidase include the aminopeptidase 1, LAP, (Liver Activating Protein) proline aminodipeptidase, leucine aminopeptidase, microsomal peptidase and cathepsin C. Serine carboxypeptidases have proven to be useful in sequentially removing residues from the C-terminus of a protein or a polypeptide. Carboxypeptidase Y (CPY), in particular, is an attractive enzyme because it non- specifically cleaves all residues from the C-terminus, including proline. See, e.g., Breddam et al. (1987) Carlsburg Res. Commun. 52:55-63, U.S. Pat. No. 5,869,240 (Patterson); U.S. Pat. No. 5,792,664 (Chait et al.); and Tsugita et al. (1992) "C- terminal Sequencing of Protein: A Novel Partial Acid Hydrolysis and Analysis by Mass Spectrometry," Eur. J. Biochem. 206:691-696.
[0007] The procedures described above may require, at a minimum, sub- femtomole (>10"15) concentrations of polypeptide. The methods may also be sensitive to the purity of the polypeptide sample, which may give rise to sequencing errors. Carryover of incomplete amino-terminal cleavage into the next cycle may result in a steadily increasing proportion of a population of molecules being out of phase with the expected order of release. Additionally, recovery and detection of the cleaved N- terminal amino acid may be difficult and/or time-consuming under current procedures.
[0008] Knowing the primary structure and composition of polypeptides is important for scientific and medical research and in the development of medical treatments. Many proteins are modified after translation. Specific side chains of some proteins may be altered. Chemical analyses of proteins in their final form are typically needed to delineate post-translational changes, which may be used for understanding biological activities of a protein or for providing potential drug targets for pathogens. [0009] Much effort in developing high throughput methods for biological agent detection has focused on nucleic acid based technologies. However, a protein based detection scheme may provide for the detection of closely related classes of biological agents since these agents possess a multitude of species and strain specific protein molecules. For RNA and DNA viruses, for example, the coat or envelope proteins may be targeted. In vegetative or sporulated bacteria, any unique protein or series of particular marker proteins may be targeted to detect the presence of a particular species. With protein toxins, the toxin molecule itself may be used as the biomarker. Current technologies for multiplexed protein analysis include antibody microarrays, 2-dimensional gel electrophoresis, and shotgun proteomics. These methods may be limited in their ability to overcome the sensitivity and selectivity problems associated with the analysis of such a diverse chemical milieu of polypeptides present in biological samples.
[0010] The methods herein are directed to new assay technologies that allow identification and quantitation of multitudes of peptides at the single molecule level with a low false positive rate. This technology has application to the specific detection of novel, emerging or engineered peptides or proteins as well as application to the detection of functional signatures of known peptides or proteins. [0011] Methods herein described may provide a highly sensitive and rapid method for sequencing a polypeptide that does not require labeling of the target polypeptide before sequencing and avoids the repeated isolation and analysis of cleaved portions of a polypeptide as in past sequencing methods. Methods described herein apply to sequencing from either the C- or N-terminus of any peptide. As an example, the methods described herein provide for identifying the N-terminal amino acid of a peptide while it is still covalently linked to the peptide. The methods described herein may provide for sequencing or structurally characterizing a polypeptide using an N- terminal amino acid complexing agent.
SUMMARY OF THE INVENTION [0012] A process for simultaneously identifying N-terminal amino acids of two or more polypeptides is provided. The process comprises forming a complex between an N-terminal amino acid moiety of a polypeptide with a complexing agent capable of binding to the N-terminal amino acid moiety. The complexing agent has a detectable label. Detecting the detectable label in the complex provides for identification of the N-terminal amino acid of the polypeptide.
[0013] A process for determining at least a portion of the amino acid sequence of a polypeptide of interest is also provided. The process comprises the steps of forming a complex between an N-terminal amino acid moiety of the polypeptide and a complexing agent having specificity for the N-terminal amino acid moiety of the polypeptide. The complexing agent is selected from a set of complexing agents, where each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection. The detectable label in the complex is detected in order to identify the N-terminal amino acid moiety of the polypeptide. The N-terminal amino acid moiety from the polypeptide is removed. The steps of forming a complex and detecting the detectable label in the complex are repeated in order to determine at least a portion of the amino acid sequence of the polypeptide of interest.
[0014] A process for determining at least a portion of the amino acid sequence of a plurality of polypeptides in a sample is also provided. The process comprises bonding at least some of the plurality of polypeptides of the sample, each at a specific location on a surface. The surface is contacted with one or more complexing agents, the complexing agents selected from a set of complexing agents, where each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection. Complexes between N-terminal amino acid moieties of the at least some of the polypeptides and a complexing agent from the set of complexing agents are formed. The detectable label in the complex is detected at specific locations on the surface in order to identify the N-terminal amino acid moiety at the specific locations on the surface. The N-terminal amino acid moiety from the polypeptides at the specific locations on the surface are removed and the complexing and detection steps repeated in order to determine at least a portion of the amino acid sequence of the at least some of the polypeptides at the specific locations on the surface.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] FIG. 1 illustrates an embodiment depicting a process for identifying and/or sequencing a surface bound polypeptide. [0016] FIG. 2 illustrates an embodiment depicting a process for identifying and/or sequencing a surface bound polypeptide using an aptamer complexing agent. [0017] FIG. 3 illustrates an embodiment depicting a process for identifying and/or sequencing a surface bound polypeptide using N-terminal aminopeptidases.
DETAILED DESCRIPTION [0018] In one aspect, a process for identifying the N-terminal amino acid of a polypeptide is provided. The process comprises forming a complex between an N- terminal amino acid moiety of a polypeptide with a complexing agent having specificity for the N-terminal amino acid moiety. The complexing agent has a detectable label. Detecting the detectable label in the complex provides for identification of the N-terminal amino acid of the polypeptide. [0019] In another aspect, a process for determining at least a portion of the amino acid sequence of a polypeptide of interest is provided. The process comprises the steps of forming a complex between an N-terminal amino acid moiety of the polypeptide and a complexing agent having specificity for the N-terminal amino acid moiety of the polypeptide. The complexing agent is selected from a set of complexing agents, where each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) has a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection. The detectable label in the complex is detected in order to identify the N-terminal amino acid moiety of the polypeptide. The N-terminal amino acid moiety from the polypeptide is removed. The steps of forming a complex and detecting the detectable label in the complex are repeated in order to determine at least a portion of the amino acid sequence of the polypeptide of interest. [0020] In another aspect, a process for determining at least a portion of the amino acid sequence of a plurality of polypeptides in a sample is provided. The process comprises bonding at least some of the plurality of polypeptides of the sample, each at a specific location on a surface. The surface is contacted with one or more complexing agents, the complexing agents selected from a set of complexing agents, wherein each complexing agent in the set has (a) specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) each complexing agent comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection. Complexes between N-terminal amino acid moieties of the at least some of the polypeptides and a complexing agent from the set of complexing agents are formed. The detectable label in the complex is detected at specific locations on the surface in order to identify the N-terminal amino acid moiety at the specific locations on the surface. The N-terminal amino acid moiety from the polypeptides at the specific locations on the surface are removed and the complexing and detection steps repeated in order to determine at least a portion of the amino acid sequence of the at least some of the polypeptides at the specific locations on the surface. Prior to describing the methods herein disclosed in further detail, however, the following terms will first be defined. [0021] In another aspect, the methods of the present invention may be performed by (a) labeling the N-terminal amino acid of a polypeptide with a complexing agent having a detectable label, (b) detecting the presence of the labeled N-terminal amino acid, and (c) removing the N-terminal amino acid using an N-terminal amino acid removing agent having specificity for one or more labeled amino acids. Such steps may be repeated to identify each subsequent N-terminal amino acid after removal of the prior complexed N-terminal amino acid. Such methods may be performed on multiple identical or different polypeptides in a high-through-put method to determine the sequence of multiple polypeptides simultaneously. For example, the methods of the present invention can be used to determine the amino acid sequence of 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100, 150, 500, or more different polypeptides simultaneously. [0022] Definitions:
[0023] As used herein, the term "polypeptide" refers generally to a molecule that comprises one or more amino acid monomers covalently linked together. "Polypeptide" includes proteins as well as short polypeptides that are approximately 100 amino acids or less in length. In one embodiment, the polypeptide is 10 amino acids or greater in length. Polypeptides may be artificially synthesized, isolated from nature or modified for compatibility with the methods herein described (e.g., the polypeptide may be digested with trypsin to reduce its size, or other enzymes may be added to remove polysaccharides, neutralizing by mild acid or neuraminidase to remove sialic acid, reacted with alkaline phosphatase to remove phosphate, or with sulfatases or by chemical means to remove sulfate or oxidize thiols). [0024] As used herein, the term "N-terminal amino acid moiety" refers to an N- terminal amino acid or its derivative. The term N-terminal amino acid moiety includes post-translationally modified N-terminal amino acids. Post-translationally modified N-terminal amino acids include, for example, amino acids resulting from deamindation of glutamine or asparagine, or partial tryptic peptides. [0025] As used herein, the term "C-terminal amino acid moiety" refers to a C- terminal amino acid or its derivative. The term C-terminal amino acid moiety includes post-translationally modified C-terminal amino acids. Post-translationally modified C-terminal amino acids include, for example, amino acids resulting from methylation or amidation of the C-terminal residue. [0026] As used herein, the term "subset of N-terminal amino acid moieties" refers to a group of N-terminal amino acid moieties, less than all of the amino acids, having a shared chemical or structural relationship. For example, the subset of aspartic acid and glutamic acid N-terminal moieties are chemically related as being acidic. The N- terminal amino acid moieties of histidine, phenylalanine, tryptophan, and tyrosine are structurally related as having an aromatic substituent, for example. [0027] As used herein, the term "subset of C-terminal amino acid moieties" refers to a group of C-terminal amino acid moieties, less than all of the amino acids, having a shared chemical or structural relationship. For example, the subset of aspartic acid and glutamic acid C-terminal moieties are chemically related as being acidic. In another example, the C-terminal amino acid moieties of histidine, phenylalanine, tryptophan, and tyrosine are structurally related as having an aromatic substituent. [0028] As used herein, the term "peptoid" refers to a peptidomimetic that results from the oligomeric assembly of N-substituted glycines. The peptiod may be substituted along its backbone in a manner analogous to an amino acid molecule. [0029] As used herein, the term "aptamer" refers generally to small single stranded RNAs or DNAs of approximately 10-120 nucleotides in length that are capable of forming secondary and tertiary structures. Aptamers include oligonucleotides which bind to N-terminal amino acid moieties. Aptamers include, for example, those having affinity to a specific amino acid as disclosed in Gold, et al., "Diversity of Oligonucleotide Function," Ann. Rev. Biochem. 64: 763-97 (1995). [0030] The term "N-terminal amino acid removing reagent" refers to a compound or composition of matter capable of removing a single amino acid moiety from the N- terminus of the polypeptide. For example, the N-terminal amino acid removing reagent including aminopeptidases such as leucine aminopeptidase, microsomal peptidase, aminopeptidase 1 , LAP (Liver Activating Protein), proline aminodipeptidase, cathepsin C and those identified using the methods disclosed herein. The N-terminal amino acid removing reagent may be a chemical compound, such as those known in the art for catalyzing the cleavage of the terminal monomers of polypeptides. Chemical agents include, but are not limited to, cyanogen bromide, hydrochloric acid, sulfuric acid, and pentafluoroproprionic fluorohydride. [0031] As used herein, "comprising," "including," "containing," "characterized by," and grammatical equivalents thereof are inclusive or open-ended terms that do not exclude additional, unrecited elements or method steps. "Comprising" is to be interpreted as including the more restrictive terms "consisting of and "consisting essentially of."
[0032] As used herein, "consisting of and grammatical equivalents thereof exclude any element, step, or ingredient not specified in the claim. [0033] As used herein, "consisting essentially of and grammatical equivalents thereof limit the scope of a claim to the specified materials or steps and those that do not materially affect the basic and novel characteristic or characteristics of the claimed invention.
[0034] The methods herein described comprise forming a complex between an N- terminal amino acid moiety and a complexing agent. The complex may be detected and the N-terminal amino acid identified. The complexing agent may be washed away after the N-terminal amino acid is identified and the N-terminal amino acid may then be removed should the identification of the adjacent N-terminal amino acid be desired. In the event that the identification of additional adjacent N-terminal amino acids is desired, the method may be repeated.
[0035] The methods described herein may provide for identification of the N- terminal amino acid moieties such as derivatives. A derivative of the N-terminal amino acid may be formed with a compound that forms a stable covalent bond. By way of example, a derivative of the N-terminal amino acid may be formed from reaction with fluorodinitrobenzenes, dabsyl chlorides, dansyl chlorides or phenyl isothiocyanates. For example, the derivative may be a phenylthiocarbamoyl derivative of the N-terminal amino acid formed by reaction with a phenyl isothiocyanate.
[0036] In one aspect, the methods herein described utilize a complexing agent specific for an N-terminal amino acid moiety of a peptide or subset of N-terminal amino acid moieties. Complexing agents specific for N-terminal amino acid moieties may have high affinity and/or high specificity for the N-terminal amino acid moiety. Complexing agents may have affinities, defined as an equilibrium dissociation constant, in the micromolar to about sub-nanomolar range, however, complexing agents having other affinities may also be used. High affinity and high specificity binding complexing agents may be derived, for example, from combinatorial libraries. [0037] In other aspects, the methods may utilize a complexing agent capable of complexing with an N-terminal amino acid moiety of a peptide. Such complexing agents may be capable of complexing with N-terminal amino acid moieties with high affinity to the N-terminal amino acid moieties.
[0038] The complexing agent may be an antibody. Antibodies specific to N- terminal amino acid moieties may be screened for specificity for an N-terminal amino acid moiety, isolated and labeled using conventional chemistries and techniques. The antibodies may be mono- or polyclonal or may be fragments of whole antibody. Antibodies specific to N-terminally modified peptide (e.g., PITC-amino acids) may be generated and used to substitute for another complexing agent (e.g., aptamer) that otherwise could not be readily generated. [0039] The complexing agent may be a protein, peptide, or peptoid. The protein, peptide, or peptoid may be screened for specificity for an N-terminal amino acid moiety, isolated and labeled using conventional chemistries and techniques. (See, e.g. Emili, et al., Nature Biotech., (2000) 18:393-397) [0040] The complexing agent may be DNA, RNA (for example, tRNA), peptide nucleic acid (PNA), glycerol nucleic acid (GNA), threose nucleic acid (TNA) or an aptamer. The DNA, RNA, PNA, GNA, TNA may be single or double stranded with at least a portion thereof capable of specifically binding an N-terminal amino acid moiety. Such complexing agents with specificity for an N-terminal amino acid moiety may be screened, isolated and labeled using conventional and modifications of conventional chemistries and techniques. (Proske, D., et al.,, Appl Microbiol Biotechnol 2005, 69, 367-374; Ellington, A. D., et al., Nature 1990, 346, 818-822; Ellington, A. D. et al., Nature 1992, 355, 850-852; Geiger, A., et al., Nucleic Acids Res 1996, 24, 1029-1036; Majerfeld, L, et al., Nat Struct Biol 1994, 1, 287-292; Tuerk, C, et al., Science 1990, 249, 505-510). A preferred complexing agent may include RNA or DNA aptamers.
[0041] Any method may be used to screen randomized oligonucleotides for aptamer activity. General methods for screening randomized oligonucleotides for aptamer activity such as, for example, the "SELEX" (Systematic Evolution of Ligands by Exponential Enrichment) method as described in Gold, et al. (U.S. Pat. No. 5,270,163) are known. In the event that the Kd for a particular amino acid(s) (e.g., valine) precludes successful aptamer development for use as a detectably labeled aptamer specific to a N-terminal amino acid, the size of the epitope may be increased by making aptamers to the corresponding PITC derivatized amino acid(s). In embodiments, aptamers capable of recognizing N-terminal amino acids from a peptide with high affinity and specificity include aptamers for both phenyl isothiocyanate modified and unmodified amino acids. [0042] Aptamers with specificity and affinity for N-terminal amino acid moieties may be used for polypeptide and protein identification and/or sequencing according to the methods herein described. Aptamers may have high affinities, with equilibrium dissociation constants ranging from between 100 micromolar to sub-nanomolar depending on the selection used, and/or have high selectivity for a single N-terminal amino acid moiety. For example, aptamers with equilibrium dissociation constants less than 20 μM, less than 10 μM, less than 5 μM, or less than 3 μM may be used. Aptamers may be of high affinity for more than one N-terminal amino acid moiety, for example, a subset of N-terminal amino acid moieties. In such a case, the affinity for the subset of N-terminal amino acid moieties may be detectably distinguishable. Aptamers may be selective for more than one amino acid moiety, for example, a subset of N-terminal amino acid moieties. In such a case, the selectivity for the subset of N-terminal amino acid moieties may be detectably distinguishable. [0043] Aptamers may be modified to improve binding specificity or stability as long as the aptamer retains a portion of its ability to bind and recognize its target amino acid. Methods for modifying the bases and sugars of nucleotides are known in the art. An aptamer may comprise a phosphodiester, phosphoroamidite, phosphorothioate or other known linkage between its nucleotides if the linkage does not substantially interfere with the interaction of the aptamer with its target amino acid.
[0044] Aptamers suitable for use in the methods herein described may be synthesized by a polymerase chain reaction (PCR), a DNA or RNA polymerase, a chemical reaction or a machine synthesizer according to standard methods known in the art such as, for example, an automated DNA synthesizer from Applied Biosystems, Inc. (Foster City, Calif.) using standard chemistries. Aptamers against derivatized amino acids (e.g., PITC-amino acid derivative) may be more specific or have greater affinity due to an increase in the size of the epitope that the aptamer must recognize. [0045] Multiple complexing agents may be used. For example, a set of complexing agents, wherein each complexing agent in the set has (a) specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) each complexing agent comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection. The number of complexing agents may be equal to or less than the number of amino acids present, known or expected to be present in the sequences of the peptides. The selectivity of the complexing agent may be such that the complexing agent is nonspecific to a limited number of amino acids present, known or expected in the sequences of the peptides. The affinity of a non-selective complexing agent may be such that although the complexing agent is non-specific to more than one amino acid, it has a higher affinity for one of the amino acids.
[0046] Properties of the complexing agent may be improved by modification of all or a portion of the complexing agent. Using a single stranded RNA aptamer for example, a pyrimidine may be replaced with a 2'-fluoro-pyrimidine to increase its affinity, or the backbone of the RNA aptamer may be replaced by phosphorothioate or phosphoroamidite to increase the stability of an aptamer or its affinity for its target. Mixtures of complexing agents may be exposed to the target N-terminal amino acid moiety and then subjected to crosslinking such that a covalent linkage is formed between the relevant complexing agents and target amino acid moiety. Such modifications should not substantially interfere with PCR amplification of modified nucleotide-based complexing agents. Improving the properties of the complexing agent includes selection of a suitable group of agents; the group of agents may then be modified and/or partitioned to select for improved affinity and selectivity. Modification may also be made to the complexing agent to limit the non-specific binding of the agent.
[0047] The complexing agent may comprise a detectable label. The detectable label may be a radiolabel, luminescent, chemiluminescent, electrochemical, colorimetric, or colloidal gold label. Luminescent labels include fluorophores and quantum dots. Complexing agents specific for different N-terminal amino acid moieties may be labeled with a different label such that each amino acid known or expected to be present in the polypeptide may be detected and/or distinguished. When multiple complexing agents specific for different N-terminal amino acid moieties are used, for example, a set of complexing agents, each complexing agent may be specific for an N-terminal amino acid moiety and may have a detectable label that is distinguishable from the other detectable labels of the set. For example, the wavelength emissions of each label corresponding to each N-terminal amino acid specific complexing agent may be distinct from each other. The complexing agents may optionally have two or more of the same or different labels attached to them. By way of example, fluorophores useful in the methods herein described may be commercially obtainable, such as TAMRA, Hoechst dye, fluorescein, rhodamine, Texas Red, 40 nm fluorescent beads sold by Molecular Probes TransFluro Spheres (Q- dots™), or any of the Cy or Sypro dyes. Fluorophore labels may be attached to a complexing agent by standard methodologies. [0048] Fluorescent molecules may include quantum dots, such as QDots™ (Invitrogen). Thus, the complexing agents used in the sequencing or physical characterization methods herein described may be labeled or tagged. The label may be an optically detectable species. The label may be a moiety capable of luminescence, light scattering, or wavelength shifting, and may be directly or indirectly associated with binding of the complexing agent with the N-terminal amino acid.
[0049] In another aspect, the methods of the present invention may involve the detection of a terminal amino acid having a detectable label, then removing the terminal amino acid using a removing agent specific for the terminal amino acid. For example, the methods herein described may utilize an N-terminal amino acid removing agent having specificity for one or more N-terminal amino acids of a peptide or subset of N-terminal amino acid moieties. [0050] The methods herein disclosed may provide for high throughput protein detection and characterization technique, using single peptide molecule detection techniques. Single molecule detection may be achieved, for example, with a variety of different optical means using luminescence or fluorescent molecules. The label of the complexing agent used in the methods herein described may be detected by methods known in the art. For example, the label of the complexing agent may be detected with a suitably configured optical microscope. The emitted radiation may be directed by the microscope onto detection elements, such as a charged-coupled device (CCD) camera. The microscope may have unique optical filters each connected to a CCD camera, each optical filter corresponding to one label from a plurality of complexing agents so that each label may be detected and optionally recorded by the CCD camera. The CCD camera may then convert the emitted radiation into an electrical signal that may be read by a computer. Luminescence of the label detected by the microscope CCD may be fluorescence. By way of example, a total internal reflectance fluorescence (TIRF) microscopy may be used. Dye labels may be laser- excited using confocal, evanescent-wave or other geometries for low background detection of the individual labels. Radiolabels may be detected using standard radiometric techniques.
[0051] The complexing agent may comprise a metal colloidal particle. The metal colloidal particles preferably provide for strong absorption without substantial loss of the complexing agents' affinity for their binding or bindable counterpart. Depending on the circumstances, metal colloidal particles may be detected, for example, by direct visual examination, by microscopic techniques or spectrophotometric techniques. [0052] The methods herein described may additionally comprise the step of contacting a secondary factor to the complexing agent that is bound to the N-terminal amino acid moiety to boost or modulate the signal derived from the binding of the complexing agent to the N-terminal amino acid moiety of the polypeptide, which may increase the sensitivity of the method. This secondary factor, for example, may be a second complexing agent as herein described such as a labeled aptamer, antibody, protein or compound that recognizes the complexing agent, the complex, or a tag that may be bound to the complexing agent. The secondary factor may provide for fluorescence resonance energy transfer (FRET) or may quench the fluorescence. Such secondary factors include, for example, green or red fluorescence protein (GFP and RFP, respectively) and their derivatives.
[0053] In the methods herein disclosed, the polypeptide may be digested before identifying the N-terminal amino acid or before determining at least a portion of the polypeptide sequence. Digested polypeptide includes polypeptide cleaved at specific amino acids into smaller peptides ("digested peptide") that may then be sequenced by the methods herein described. Thus, the method herein described includes forming a complex between an N-terminal amino acid moiety of a corresponding digested peptide from the polypeptide of interest and a complexing agent, as described above. Reconstruction of the polypeptide sequence may be provided using the determined sequences of the one or more of the digested polypeptides.
[0054] Specific cleavage of polypeptides into digested peptides may be achieved by chemical or enzymatic methods. The mixture of peptides obtained by specific chemical or enzymatic cleavage may be identified and/or sequenced or they may be separated prior to identification and/or sequencing, for example, by chromatography or SDS-PAGE. Non-limiting methods of specifically cleaving polypeptide chains are provided in Table 1. Table 1.
Figure imgf000017_0001
[0055] The methods herein described may be useful for polypeptides of either known or unknown structure. In the case of a known or putative structure, as where synthetic polypeptides are obtained from a commercial supplier or isolated from a glycoprotein of known or suspected structure, for example, a combination of cleavage agents may be designed to verify or confirm the putative structure or sequence. For example, the amino acid sequences of digested segments of the original polypeptide may be recombined, for example, by overlapping peptide sequences derived from the polypeptide cleaved with a second enzyme/chemical agent that cleaved the polypeptide chain at different linkages. By way of example, chymotrypsin cleaved sequences may be overlapped with chymotryptic sequences derived from the same polypeptide. Such overlapping peptide analysis procedures may be integrated into the method herein described. [0056] If the polypeptide of interest is present in a sample comprising several polypeptide chains, purification or isolation of all or a portion of the sample may be performed, for example, using SDS-gel electrophoresis under reducing conditions. The number of distinct N-terminal amino acids or the total polypeptide content may be determined prior to a separation/purification procedure. The sample containing the polypeptide of interest may be denatured using denaturing agents, such as urea or guanidine hydrochloride, before sequence determination of the individual chains. The sample containing the polypeptide of interest may be treated with reducing agents such as 2-mercaptoethanol or dithiothreitol to separate disulfide bonds or deactivate thiols. The sample containing the polypeptide of interest may be alkylated with iodoacetate to form stable S-carboxymethyl derivatives to prevent cysteine residues from recombining. Identification and/or sequencing may then be performed as heretofore described. [0057] A protein may be modified after translation. Specific side chains of the protein may be altered. Polypeptides of interest may be bound to another molecule such as, for example, glycolipids or a glycan moeity. Such polypeptides may be analyzed according to the methods herein described while still attached to the molecule or such other molecule may be cleaved prior to analyzing the polypeptide. The methods herein described may provide for determining information as to whether an amino acid sequence comprising post-translational modifications is present or that a specific modified amino acid is present. For example, an aptamer with specificity for a known post-translational modification may be used.
[0058] The method for sequencing the polypeptide herein disclosed may include the step of removing one or more of the N-terminal amino acids with an N-terminal amino acid removing reagent. The N-terminal amino acid removing reagent(s) that may be useful according to the methods herein described will depend upon the nature of the N-terminal amino acid of the polypeptide and the sequence or type of structural information desired. Several N-terminal amino acid removing reagents are known in the art for polypeptides. Aminopeptidases are commercially available, for example from reagent suppliers such as Sigma Chemicals (St. Louis, Mo.) and Oxford
Glycosystems (Rosedale, N. Y.). Chemical agents may include acids, for example, for providing hydrolysis of the N-terminal amino acid moiety in accordance with the methods disclosed herein. Alternatively, aminopeptidases may be developed using mutagenesis of known aminopeptidases to produce aminopeptidases having different N-terminal amino acid removing capabilities. For example, in some embodiments, aminopeptidases may be developed using mutagenesis techniques to produce aminopeptidases having specificity for different N-terminal amino acids. Methods for developing such aminopeptidases are provided in the Examples below.
[0059] Any of the aforementioned N-terminal amino acid removing reagents may be suitable for elucidating the structure of the polypeptide including its sequence according to the method herein described. Combinations of the above-described individual removing agents may be used. For example, chemical removing agents may be used with enzymatic removing agents. Two or more removing agents may be used simultaneously or sequentially on a polypeptide. The specific combination and the circumstances under which such a combination is appropriate will depend upon the nature of the polypeptide and the information desired. [0060] In embodiments, the N-terminal amino acid removing agent comprises the agents of the Edman degradation procedure. By way of example, the Edman degradation procedure comprises forming a derviative of the N-terminal amino acid by reacting a phenyl isothiocyanate (PITC) with the N-terminal amino acid under basic conditions (e.g., n-methylpiperidine/methanol/water) to form a phenylthiocarbamyl derivative (PITC -N-terminal amino acid). Trifluoroacetic acid may then be used to cleave off the PITC-N -terminal amino acid as its anilinothialinone derivative (ATZ-amino acid), leaving a new amino terminus. This process may be repeated in combination with the complexing and detecting method herein described to provide N-terminal amino acid sequence information for the polypeptide. The ATZ amino acid may be removed by extraction with N-butyl chloride and converted to a phenylthiohydantoin derivative (PTH-amino acid) with 25% TF A/water for complementary analysis if desired (for example, using a reverse- phase C-18 column with UV detection at 280nm).
[0061] Unmodified cysteine residues, which may interfere with the Edman degradation procedure as described above, may be modified by known methods and chemistries. Blocked amino termini (e.g., an amino terminus that is glycosylated or phosphorylated) of the protein or peptide may resist removal during the N-terminal amino acid removal procedure. De-blocking procedures may be performed by known chemistries and methods. [0062] The methods herein described may be complemented with other methods of determining the N-terminal amino acids or other methods of determining the sequence of a polypeptide, concurrently or subsequently with the methods herein described. For example, the cleaved terminal amino acid(s) may be concurrently or subsequently characterized using HPLC or mass spectrometric methods. For example, the phenylthiocarbamoyl derivative of the N-terminal amino acid may be liberated and the cyclic phenylthiohydantoin (PTH)-amino acid may be identified by chromatographic methods such as high-pressure liquid chromatography, gas-phase sequenators, mass spectroscopy and/or other procedures. The complementary procedure may be repeated as needed or desired. The methods herein described may be complemented with protein purification procedures such as SDS-polyacrylamide gel electrophoresis. Such procedures may be automated and/or integrated with the methods herein described. Other complementary procedures may be performed as needed to assist in the ascertainment of the protein's structure, for example, the positions of the original disulfide bonds may be obtained by using a diagonal electrophoresis technique to isolate the peptide sequences containing such bonds. [0063] The methods described herein may provide a process for determining at least a portion of an amino acid sequence of a polypeptide of interest. The process comprises forming a complex between an N-terminal amino acid moiety of the polypeptide and a complexing agent having specificity for at least one N-terminal amino acid moiety of the polypeptide. Multiple complexing agents having specificity for different N-terminal amino acid moieties may be contacted with the polypeptide, for example, the complexing agents may be selected from a set of complexing agents, where each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection. Detecting the label in the complex provides for identifying the N-terminal amino acid of the polypeptide. The complexing agent and/or the N-terminal amino acid moiety may be removed from the polypeptide such that the adjacent amino acid may be identified in order to determine at least a portion of the amino acid sequence of the polypeptide of interest. This cycle may be repeated, for example, 6-10 times in order to obtain enough sequence information for identification of a known peptide sequence, or repeated, for example, 10-16 times for characterization of an unknown sequence. More or less repetitions of this cycle may be used.
[0064] The above method may be carried out in solution or may be adapted to a surface, solid support, beads, and/or array or microarray. [0065] The method of identifying and/or sequencing a polypeptide of interest may be optimized for automation using a solid support or an array. For example, the polypeptide (or a digested portion of the polypeptide) is deposited onto a surface in an orderly manner such that it is separated from other polypeptides, such as to provide a microarray. The surface of the solid support or the array may provide bonding in an orderly fashion. For example, the surface may comprise equally spaced binding sites for one or more of the same or different polypeptides. The binding site may be situated such that the C-terminus of the polypeptide will bind preferentially. The surface may be treated to reduce non-specific binding. The surface may be patterned to facilitate containment of the polypeptide to a region on the surface and/or create a reaction chamber to facilitate the binding of the polypeptide to the surface or to facilitate complex formation/removal or terminal amino acid removal. The polypeptide may be covalently or non-covalently attached to the surface. [0066] By way of example, polypeptides may be sequenced using a solid support or microarray comprising a surface amenable to bonding of the polypeptide. Thus, a method for sequencing may comprise bonding the polypeptide to a solid support and forming a complex between the N-terminal amino acid and a labeled complexing agent, and a detection step comprising the detection of the label of the complex. [0067] Surfaces of a solid support or an array useful for binding to a polypeptide will depend upon the type of polypeptide being analyzed and the type of method being performed. For example, the polypeptide of interest may be attached to the surface of the solid support at or near its C-terminus. Solid supports useful for binding to polypeptides in the art includes, for example, glass beads, cellulose beads, polystyrene beads, SEPHADEX beads, SEPHAROSE beads, polyacrylamide beads and agarose beads (see, e.g., Ghosh, et al., "Covalent Attachment of Oligonucleotides to Solid Supports," Nucleic Acids Research. 15:(13) 5353-5372 (1987); U.S. Pat. No. 4,992,383 (Farnsworth); both of these references are hereby incorporated by reference herein). The solid support may be silica, silicon, glass such as borosilicate glass, or plastic functionalized to enable covalent or non-covalent coupling. Functionalized surfaces of the solid support may be obtained using conventional silanization methods to incorporate reactive groups, or by thin-film deposition of polymers containing reactive functional moieties. The functional group is chosen to facilitate covalent binding of polypeptides, preferably through the C-terminus. Preferably, the surface is otherwise passive to the absorption of complexing agents that bind the polypeptides. Preferably, the functional group is an amine that terminates a surface-bound linker, to which the C-terminal amino acid may covalently couple, for example, in the presence of imidazole and a carbodiimide (e.g., EDAC). The surface may additionally be patterned, such as in a microarray. Other patterning, such as patterns of hydrophilic patches separated by hydrophobic regions, or patterns of surface depressions (nanowells) may be used, which may be obtained by replication from a master generated by standard lithographic techniques. The polypeptide may be attached at both the C-terminus and N-terminus followed by cleavage of at least one peptide bond of the bound polypeptide to provide at least one accessible N-terminal amino acid. [0068] The surface may be contacted with one or more complexing agents, the complexing agents selected from a set of complexing agents, wherein each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection, under conditions that favor complexing agent binding with the N-terminal amino acid moiety of the polypeptide(s). The surface onto which the polypeptide is deposited may be washed before and after a complexing agent is bound to the polypeptide. The identity of the N-terminal amino acid for the polypeptide at a specific location on the surface may be determined as described above. A compilation of the amino acids corresponding to a particular location on the surface may be generated and/or stored digitally for manipulation. The portion of determined amino acid sequence corresponding to the polypeptide may comprise gaps (e.g., corresponding to amino acids not identified). Gaps may be included in a compiled sequence for a polypeptide, for example, as a random amino acid or an amino acid of a particular functionality, acidicity etc.
[0069] Optionally more than one N-terminal amino acid moiety may be sequentially removed and the above sequence of steps repeated to provide at least a portion of the sequence of a polypeptide. The sequencing method comprises the step of identifying sequential N-terminal amino acids of a polypeptide, such as a protein, to provide for identification of the protein. The sequencing method may optionally comprise using an N-terminal amino acid removing reagent to remove the N-terminal amino acid from the polypeptide. [0070] By way of example, a microarray-based polypeptide sequencing procedure is described below for a polypeptide of interest. Amino acid-at-a-time sequencing of polypeptide is accomplished by the repeated sequential identification and removal of the N-terminal amino acid of a polypeptide whose sequence of amino acids is to be determined. The polypeptide of interest may be purified or present in a mixture of polypeptides and/or may be digested. The polypeptide or digested may be held fixed, at the C-terminal end or other position distal to the N-terminal amino acid to be removed, at a specific location on the microarray support or surface. The process may be multiplexed, such that complexing and N-terminal amino acid moiety removal may be performed on the array surface. A plurality of spatially separated locations (NxM array) with one polypeptide or its digested per location may be provided on the surface with each location containing the polypeptide or a single digested peptide from one polypeptide. Subsequent parallel processing and readout of the surface- bound polypeptides may greatly improve the effective sequencing rate. The use of a microarray may provide for mapping of the polypeptides on the surface and the recording of the sequence of amino acids from the polypeptide at specific locations on the surface. The quantity of an identified polypeptide on a surface may be determined to provide absolute quantification of the polypeptide in a sample. [0071] The steps for detecting and identifying the N-terminal amino acids and/or sequencing at least a portion of polypeptides spatially arranged on a microarray may be performed as follows. One or more complexing agents, such as a collection of aptamers, each aptamer specific to a N-terminal amino acid moiety, and each aptamer having a unique label, preferably a dye or group of dyes or dye-dye pairs (Fluorescent Resonant Energy Transfer (FRET)) that yield a distinguishable detectable measurement, e.g., spectral or temporal luminescence properties, may be contacted with the surface bound polypeptide. The concentration of each type of aptamer may be adjusted to be approximately 10-100 times the value of the known or estimated equilibrium binding constant for its N-terminal amino acid moiety ligand(s) such as to provide a sufficient equilibrium concentration for complex formation. The substrate microarray containing the surface-bound polypeptides may be incubated after contact with the collection of aptamers for a sufficient time to allow equilibrium to be reached. This surface may then be washed to substantially reduce or eliminate non- bound aptamers and the weak, non-specifically-bound or surface-bound aptamers. The wash time should be short enough so that N-terminal amino acid moiety-bound aptamers are not substantially removed. Excess aptamer may be collected and recycled. The surface may be dried to further immobilize the specifically-bound aptamers at the location of their respective N-terminal amino acid ligand. The substrate may then be scanned under appropriate conditions, and the labeled aptamers detected and/or recorded as a function of their specific locations on the surface. By detection and discrimination of the complexes, a map of the identity and location of N-terminal amino acid moieties on the surface may be obtained and optionally archived. [0072] The sequence of a polypeptide of interest or a portion thereof may be compared with some or all other known polypeptide sequences. Comparison may be to identify the polypeptide and/or ascertain similarities, evolutionary history, or identify pathogenic origin or similarity and the like. Comparison of a relationship of a sequence obtained by the methods herein described with that of some or all known polypeptide sequences may be performed using a computer containing or having access to a library of sequences. Comparison of the identified sequence may include database searching techniques.
[0073] The methods herein disclosed may include reconstructing protein sequences obtained from sequenced N-terminal amino acids identified from a sample so as to identify protein in a sample. Thus, the methods disclosed may be useful for determining if an organism has been modified and/or the type of modification made (e.g., increasing lethality) has an impact on its virulence. This may be accomplished by taking a single sample and dividing it into portions and then digesting each portion with a different protease. The peptides generated will then have overlapping sequence that will make it possible to reconstruct the original protein before its digestion using standard protein reconstruction techniques. For example, tryptic peptides combined with peptides from a GIu-C digest, which cleaves at aspartic and glutamic acids, may be used to reconstruct the sequence from a B. anthracis spore coat protein. Database calculations may be preformed to determine how many proteases would be necessary to reconstruct large number of proteins, for example, from B. anthracis, given that a mixture of digested peptides from a large number of different proteins will be present.
[0074] Sequences generated with missing amino acids, for example, during detection, not having the PITC group label a peptide, or not having the PITC-amino acid cleave off, may still be searchable and/or provide sufficient information for identification by database searching techniques. Missing amino acids in the sequence result in different database searching problems, for example, gaps in the sequence, gaining a "phantom" missing amino acid, or detecting the same amino acid twice. These issues may be compounded by the possibility of amino acids without aptamers or having aptamers that are not specific to one amino acid. The most important amino acids may be assumed to be those that are the most common in proteins, such as leucine, serine, alanine, glycine, etc., with the exception that lysine and arginine will be the least useful since they typically are found at the C-terminus, which will likely often not be sequenced. After determining statistically the effect of missing a single aptamer, the effect of missing multiple aptamers may be determined. Because of the computationally high number of combinations of multiple missing amino acid specific aptamers (ASAs) that would be generated, only the worst and best case scenarios may be employed, for example, that of losing the most informative amino acids and that of losing the least informative amino acids, respectively.
[0075] The methods disclosed herein may be used to directly sequence single peptide molecules as an assessment tool of virulence, organism viability, or infection (evaluating either plasma, serum, urine, or any other tissue or fluid that may contain peptide molecules). Information regarding bioengineered resistance genes or other genetic manipulation of organisms may be obtained by the methods herein disclosed. [0076] Referring now to the Figures, various illustrative embodiments will be described. A general overview of the methods described herein is depicted in FIG. 1. Thus, a sample containing a polypeptide of interest may be extracted and digested as depicted in steps A and B using standard protocols. The resulting peptides from the sample, which may number in the tens to hundreds of millions, may be bound to a surface such as a microscope slide or microarray format, as depicted in step C. The peptides may be bound to the surface such that the N-terminal amino acid is available for interaction with agents or for chemical modification. The N-terminal amino acid may be derivatized for an N-terminal amino acid removal procedure or the derivative may be formed before or after complex formation with the complexing agent. [0077] The bound peptides of the sample may be contacted with an N-terminal amino acid specific complexing agent as depicted in step D of FIG. 1. Detection of the label corresponding to the complexing agent of the N-terminal amino acid- complexing agent complex provides for determining the identity of the N-terminal amino acid. Such detection may be at the single molecule detection level. Suitable detection methods may include total internal reflectance fluorescence microscopy (TIRF). Association of the N-terminal amino acid of a specific peptide may be correlated with a corresponding position on the slide or microarray. This information may be stored digitally such that it is readily accessible for searching, additional manipulation, or archiving as depicted in step E.
[0078] After determining the identity of the N-terminal amino acid, the complexing agent or set of complexing agents may be removed or washed off the slide. The sample may be subjected to an N-terminal amino acid removal procedure to remove the N-terminal amino acid of the peptide. The N-terminal amino acid may be removed, as depicted in step F, and the peptides may again be contacted with the complexing agent or set of complexing agents with subsequent identification of the amino acid. The complexing agent(s) may be the same as those used previously or they may be different. Detection of the detectable label of the complex agent-N- terminal amino acid complex provides for determining the identity of the next sequential N-terminal amino acid. The process may be repeated a number of times such as to obtain at least a portion of the sequence of the peptide. The portion of the sequence of the peptide may be used for identification of the peptide and/or the protein from which the peptide was derived.
[0079] Referring now to FIG. 2, an example of a peptide sequencing assay using an aptamer complexing agent for complexing with a surface bound peptide derivative for providing a detectable complex is depicted. Thus, a surface bound peptide having an N-terminal amino acid is contacted with phenylisothiocyanate compound as depicted in step 1. The derivatized amino acid is contacted with labeled aptamer as depicted in step 2. Aptamer-amino acid complex is detected using imaging techniques as depicted in step 3. N-terminal amino acid removal, for example, using an Edman degradation procedure removes the N-terminal amino acid of the peptide as depicted in step 4. Steps 1-3 may be repeated or steps 1-4 may be repeated to provide at least a portion of the sequence of the surface bound peptide. In the event that a particular complexing agent is unavailable for a particular amino acid, the lack of specific sequence information corresponding to that specific N-terminal amino acid of the peptide may be indicated, for example, as a random amino acid in the sequence. [0080] Referring now to FIG. 3, an example of a peptide sequencing assay using an N-terminal amino acid removing agent specific for one or more N-terminal amino acids is depicted. In this embodiment, peptides are produced from a protein sample using a tryptic protein, as depicted in step 1. The tryptic peptides having N-terminal amino acids are bound to the surface of a slide by covaltently attaching the peptides using isothiocyanate as depicted in step 2. The N-terminal amino acid of each peptide is labeled using biotin as depicted in step 3. Labeled amino acids are detected using imaging techniques as depicted in step 4. N-terminal amino acid removal, for example, using a specific N-terminal amino acid aminopeptidase removes the respective N-terminal amino acid of the peptide as depicted in step 5. Steps 4-5 may be repeated for each specific N-terminal amino acid aminopeptidase to provide at least a portion of the sequence of the surface bound peptide. After all of the N- terminal amino acids have been imaged and removed, the steps of 3-5 may be repeated to obtain the identity of the subsequent N-terminal amino acids. The steps are repeated until the desired number of amino acids in a sequence are identified. [0081] The methods herein described may be complemented by determining the total amino acid composition of the peptide. For example, the polypeptide sample may be hydrolyzed into its constituent amino acids by heating it in 6 N HCl at 110 0C for 24 hours. Amino acids in hydrolysates may be separated and characterized, for example by ion-exchange chromatography on columns of sulfonated polystyrene or mass spectrometry. Quantification of the peptide may be obtained by reaction with ninhydrin or fluorescamine and determination of the optical absorbance of the solution. [0082] The methods herein described may provide for sequencing or characterizing a single polypeptide or a quantitative or qualitative amount of polypeptide, for example at the single molecule level or below sub-femtomolar range. By way of example, after a portion of a polypeptide of interest has been determined, the quantity of that portion of that polypeptide of interest (or the corresponding polypeptide of origin, e.g., protein) may be quantified. The amount of quantified polypeptide may be ratiometrically compared with the total polypeptide content of the sample to provide an absolute quantification of the polypeptide of interest. Preferably, multiple polypeptides of interest from a sample may be randomly positioned on a surface such as a microchip or microarray. The surface of the microchip or microarray may be scanned to determine the quantity of polypeptide of interest on the surface and quantified as a proportion of the total polypeptide content of the sample. [0083] Absolute quantitation of peptides using the methods herein disclosed may be obtained through counting the number of times each unique peptide is identified. By correlating the number of times a peptide is identified with the amount of peptides loaded (or the number of cells used) an absolute copy number of that molecule in the sample or cell may be determined. For example, tryptically digested proteins from a sample may provide multiple peptides representing each protein in the assay. The quantities of the peptides may be averaged and statistics used to minimize sampling error and/or more accurately determine the quantity of the protein. In addition, the multiple peptides of the digested cross validate each other's determination of the protein level with any discrepancies signaling a potentially relevant event, such as a post-translated modification (PTM) or alternative splicing, for example. [0084] The methods herein disclosed may provide profiling of the entire protein content of the organism and the specific ways in which the organism has been modified may be identified without a priori knowledge of what those genetic changes may be. In addition, by profiling the whole organism and understanding what changes have been made to its genetic code, more specific treatments for those already exposed may be provided and/or tailored preventive measures may be more quickly administered, for example, for preventing the spread of the biological threat. [0085] Application of the methods herein described include, but are not limited to, detection of biological warfare agents, drug discovery, proteomics biomarker identification and high-throughput screening. The methods herein described may be useful for discovering cancer biomarkers, elucidating the molecular mechanism of cancer, enabling new medical diagnostic tools through profiling of single cells and/or accelerating the drug discovery processes. The methods herein described may be useful for proteomics, such as analyzing peptides on the single molecule level, sequencing of millions of peptides in parallel and/or providing absolute quantitation information. Structural information obtained from the methods herein described includes, for example, information or suggestion regarding attributes of the primary structure of the polypeptide which may be derived from the interaction of the complexing agent with the N-terminal amino acid of the polypeptide, the amino acid composition of the polypeptide, the order in which the amino acids are linked and amino acid post-translational modification.
Examples Example 1
Process for N-terminal Amino Acid Specific Aptamer (ASA) Screening [0086] By way of example, selection of complexing agents suitable for use in the methods herein described may be derived by creating an affinity column with an amino acid, derivatized amino acid or post-translationally modified amino acid attached to it. Mixtures of random complexing agents may be screened for affinity and selectivity using the affinity column, and then isolated and/or optionally amplified to provide complexing agents that bind with the amino acid. For example, aptamer selection may be carried out, e.g., following the methods of Gold, et al. (U.S. Pat. No. 5,270,163), which describes the "SELEX" (Systematic Evolution of Ligands by Exponential Enrichment) method.
[0087] The SELEX process may be used to screen for amino acid and/or PITC derivatized amino acid selectivity, for example. First, the ligand of interest, in this case an individual amino acid(s), is conjugated to a support matrix. The support matrix may be a chromatography media with free amino groups so that the amino acids can be conjugated through their carboxyl groups. To make the PITC derivatized amino acids for selection, the PITC reagent is reacted with the amino acids on a column. An aptamer library may then be synthesized with a pair of known primers surrounding a region of random nucleotides. Typically, ~1014 different aptamers are used as the starting pool for selection, which roughly equals 100s of picomoles of aptamers. The aptamer library may then run over the column so that mainly those molecules that bind either the epitope or the support matrix remain. These aptamers may then be affinity eluted with the appropriate amino acid or with high salt. This first round aptamer pool may then be amplified by PCR. This process may be repeated from 5-20 times to isolate highly specific aptamers. The above screening protocol may be adjusted with the following modifications and additions. First, in later rounds of PCR random mutagenic PCR may be used to provide possible nucleotide permutations to find the most specific aptamer structure. Second, counter- selection on support matrix derivatized with PITC may be used to remove aptamers that recognize either the support material or only the PITC motif. Third, before elution of the aptamers from the affinity column, other AA-PITC conjugates may be used in counterselection to eliminate aptamers that are not highly specific to the epitope of interest. Nontraditional nucleotides, such as nucleotides derived with amines or fluorine, in aptamer selection or RNA aptamers may be used.
[0088] An exemplary generation of aptamers against an amino acid by the SELEX process is now described. Selection media for the SELEX process may be CarboxylLink media (Pierce, Rockford, IL). Selection column size may be approximately 100 μL of media packed into miniature homemade columns. Fmoc derived amino acids (Bachem, King of Prussia, PA) may be coupled to the media through reactions with the l-Ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) in 100 mM MES Buffer (100 mM 2-(N-morpholino)ethanesulfonic acid, 0.9% NaCl) (Pierce, Rockford, IL). The use of Fmoc blocked amino acids may stop the reaction from forming polymer chains of the amino acids on the media. The Fmoc group may then be released by washing the column with base. The column may then be washed with methanol/10 mM HEPES pH 8.3 (methanol/HEPES) to prepare for PITC coupling. Each column may then be equilibrated with 1% PITC in methanol/HEPES and allowed to stand at RT for 30 minutes. To drive the reaction to 100% completion, the PITC incubation may be repeated three times. A counter selection column may be generated by simply conjugating the media with PITC reagent.
[0089] One nmol of synthesized aptamer library may be dissolved in 300 μl of Aptamer Selection Buffer (AS buffer), which consists of 100 mM HEPES pH 7.0, 250 mM NaCl, 5 mM MgCl. The aptamer may then be heated to 95° C for 10 minutes and then snap frozen on ice to facilitate folding. The column may then be washed with 10 column volumes of AS Buffer to remove unbound ASA's. ASA's may be eluted with 5 mM of the appropriate amino acid or AA-PITC in AS Buffer. In the cases where the PITC labeled amino acids do not dissolve in the AS Buffer, 1 M salt in AS Buffer may be used to elute the aptamers, though more rounds of selection and counter selection may likely be necessary. The elution buffer may be heated to 95° C directly before running onto the column to cause aptamers with very low Kd to dissociate from the column. The elution buffer may then be allowed to cool on column so that those aptamers that recognize the column rebind. The eluted aptamers may then be precipitated with ethanol and then amplified by PCR. Two rounds of PCR may be performed for every round of aptamer selection. The first round of PCR may be just amplification of the eluted aptamers in selection rounds 1-2 and then mutagenic PCR in all later rounds. The second round of PCR may use a 10 fold excess of the 3' primer (100 pmol versus 10 pmol of the 5' primer) to preferentially generate the one strand and using a small amount of the first round PCR as starting material. After the first three rounds of selection, counter selection columns may be used to further deplete those aptamers that show specificity to the media and to remove those aptamers that only recognize the PITC group. To generate aptamers that possess a high degree of amino acid specificity, mixtures of the five most structurally similar amino acids to that amino acid which is being selected may be washed over the column before elution, under conditions identical to those described for elution. Isolation of individual aptamers may be accomplished by cloning the ASA's from rounds 10 and 15 of selection into plasmids. The plasmids may then be transformed into E. coli. The transformed E. coli may be selectively plated and individual colonies may be isolated.
[0090] The specificity of the aptamers may be tested against other amino acids. To test the specificity of the aptamers, individual amino acids and AA-PITCs may be conjugated to glass slides and then fluorescently labeled aptamer isolates may be incubated on the slide to determine binding strength. Derivatized glass microscope slides covered with primary amines (Corning, Acton, MA) may be sectioned off into a series of individual wells using silicon mats (Grace Bio-Labs, Bend, OR). Individual Fmoc labeled amino acids may be conjugated in each well by incubating with EDC in MES Buffer at RT for 24 hours. The Fmoc group may then be removed with base. The slides may then be washed in methanol/HEPES buffer. The amino acids may be derivatized with PITC by incubating each well with 1% PITC in methanol/HEPES buffer for 1 hour at 37° C. The wells may then be washed with methanol/HEPES and AS buffers. [0091] Plasmids isolated from individual E. coli colonies may then be used to PCR individual APs for testing. The 3' primer may be present in 10 fold excess and may be conjugated to a Cy3 molecule. After PCR, the products may cleaned using PCR Cleanup Kit (Qiagen, Valencia, CA). Cy3 labeled PCR products may then be resuspended in AS Buffer and incubated on the slide in each well. After washing with 20 well volumes (-20 μl) of AS Buffer, the amount of fluorescence on the slide may be assayed with a Typhoon Laser Scanner (Amersham, Piscataway, NJ) set to detect Cy3 molecules. Aptamers that show the greatest Cy 3 signal intensity in the well of the appropriate amino acid compared to the other 19 amino acids may be used as aptamers for single molecule sequencing. QuantumDots as detectable labels
[0092] Attachment of quantum dots to the amino-acid specific aptamers and optimization of the Edman degradation solution conditions may provide for maximization of PITC reactions and aptamer binding. Such optimization may provide attomole or zeptomole detection capability. For example, the ability to detect low levels of quantum dots, QDots™ (Invitrogen), for example, using a Typhoon Scanner (Hayward, CA), in solutions of QDots™ quantum dots spotted onto microscope slides at different concentrations has been demonstrated to provide at least 80 fmol, at least 800 attomole, and at least 80 attomole detection limits, without optimization of detection efficiency.
Protein Identification Using Limited Numbers of Complexing Agents
[0093] The methods herein disclosed may provide for the use of less than 20 uniquely specific amino acid complexing agents (e.g., aptamers) for peptide identifications. Probability calculations based on simple combinatorial numbers of recognizable amino acids, or groups of amino acids, as a function of the amount of sequence information, may be determined using standard statistical methods. The twenty amino acids may subdivided into a limited number of distinguishable pools based on a predicted aptamer availability. For example, four aptamers specific to leucine, serine, isoleucine, and alanine would divide the 20 amino acids up into five pools with the fifth pool containing all unrecognizable amino acids. For instance, three aptamers sequenced over a continuous stretch of five amino acids would provide 35 (243) distinct combinations. When considering the number of distinguishable entities versus the size of the theoretical peptide databases, for example, the E. coli proteome (80,000), human proteome (1,200,000), and human genome (600 million), a limited number of aptamers of less than 20 is needed for the identification of a peptide from a specific, finite database.
[0094] For example, the theoretical size of a database may be calculated by dividing the average size of a protein, (e.g., 20 KDa for E. coli), by the average peptide size, 1000 Da, and multiplying that by the number of proteins in an existing database, for example, ~ 4000 proteins. In Table 2, the number of theoretically distinguishable peptides is highlighted by the size of the peptide database that could be searched assuming a 100 fold excess of distinguishable peptides to account for false positives. The numbers represent the E. coli proteome, the human proteome, and the human genome.
Table 2. Aptamer pool calculations, for E. coli, the human proteome, and the human genome.
Figure imgf000033_0001
I=E. coli; 2=human proteome; 3=human genome.
[0095] The data in Table 2 is based on a 100 fold increase in selectivity for recognizing unique peptides over the total number of entries in the database, however, all databases peptides are not unique and masking some amino acids so that they are indistinguishable will likely decrease the number of unique peptides identifiable. [0096] By way of example, using database search algorithms, the tryptic peptide sequences in an E. coli proteome database may be converted into strings that represent amino acids with corresponding aptamers and amino acids with no corresponding aptamers. Using the five pools described above, for example, the sample sequence ILPTEQSNAR may be transformed into ILuuuuSuAu ("u " for unknown amino acid) and then queried against the converted database. [0097] The peptides may be limited in length, for example, to simulate the database queries obtained from sequencing a limited number of amino acids. By way of example, two different simulations were run, one that required there to be a number of amino acids at least as long as the sequence read (e.g. Table 3, row 1), and the other where the peptides were at least five amino acids long with "u " for unknown amino acids concatenated to the end to bring it up to the sequence read length (e.g., Table 3, row 2). The results from several simulation runs are listed in Table 3. The results from these simulations are consistent with probability models. More complex simulations to accurately access the aptamer/sequence read length combination needed for any given database may be performed.
Table 3. E. coli simulation runs demonstrating the effect of unknown amino acids on identifying unique peptides.
Figure imgf000034_0001
[0098] From the general probability model and the simulations, aptamers for all twenty amino acids may not be necessary to identify peptides from tryptic peptide databases. For example, even in the case where there are four aptamers, which provide 5 amino acid pools, and the sequence for 10 positions is read, 45% of the E. coli tryptic peptides (greater than 10 units long) are unique (Table 3, row 3). Thus, for the average protein containing 20 peptides there would be less than a 0.001 chance that it would be absent a unique peptide, or that in the entire database, only four proteins would be without a unique peptide that the methods herein disclosed could quantitate. Thus, theory and initial simulations show the methods herein disclosed are capable of identifying multitudes of peptides with an aptamer set of less than 20. [0099] Groups of aptamers having indistinguishable selectivity based on the structural similarity of the amino acids may be used. For instance, tryptophan, phenylalanine, and tyrosine are all very similar in that they are all aromatic amino acids. This information may direct aptamer development by suggesting which amino acids are critical. This information may provide alternative approaches for N-terminal amino acid moiety detection such as using RNA or DNA aptamers having nontraditional nucleotides or antibodies.
[00100] Thus, identifying a component in a sample with a member of a database where a determined sequence is missing an amino acid may be provided. An amino acid may be missing because of lack of detection or having the amino acid remain through more than one cycle as described herein, or because of miscleavage or lack of PITC formation. Different probabilities for those events occurring only once per peptide may be used to estimate. The estimated rates for incorporation of the PITC label onto an amino acid and cleavage is high and typically greater than 95%. Thus, probabilities in the range of 90% or greater success may be used. The chance of aptamers not binding to an epitope may be determined, for example, by empirical methods.
[00101] The effect of a limited (less than 20) aptamer set may be estimated by ranking the top ten most informant amino acids whose absence may have the greatest negative impact on the numbers of identifiable peptides. These aptamers may then be developed and used. For example, if it is found that lack of a leucine aptamer limits the number of unique peptides in the B. anthracis proteome to 10,000, but the lack of cysteine aptamer makes that limit 30,000, developing the leucine aptamer may be deemed of greater importance. [00102] Software may be used to perform direct sequence comparison or sequence alignments. Aligning the sequences may not be necessary using the method herein disclosed because only a comparison to a finite number of known sequences, where the starting point for comparison will almost always be the first amino acid in the sequences, would generally be performed. The most current databases may be downloaded from external sources, for example, for B. anthracis or for any combination of bacteria proteomes and genomes.
[00103] A single aptamer that recognizes several different N-terminal amino acid moieties may be used. For example, one aptamer that recognizes two N-terminal amino acid moieties, or 5 aptamers that each recognize 4 N-terminal amino acid moieties may be used in the methods herein described. The five aptamers with four N-terminal amino acid moiety members roughly correlates with probability based calculations suggesting that approximately four aptamers and 10 N-terminal amino acids sequenced may be needed to identify an average peptide from a tryptic digested database, for example, for a known organism such as B. anthracis.
[00104] The minimal information needed to accurately identify a peptide from partial sequencing using simulation and computer modeling may be determined theoretically and/or statistically. Initial bioinformatic analysis may provide simulations of database searching using limited and partial amino acid sequences for determining the optimal amino acids to develop detectable aptamers.
[00105] Peptides other than those in a protein database that may be present in a sample (i.e. environmental clutter and near neighbor organisms) may effect the false positive rate. Therefore, simulations may be used to determine expected false positive and false negative rates under "less-than ideal" conditions, as may be expected in real world applications of the methods herein disclosed.
[00106] The methods herein disclosed may reduce false positives typically associated with current peptide sequencing and protein determination techniques (for example, the influence of having peptides in the sample not present in the database). For example, using software simulation tools, randomly generated tryptic peptides and tryptic peptides from C. elegans (random organism to determine false positive rates) that have no exact match in a test database may be searched against the test database. By varying the lengths of the query peptide sequences, it may be determinable how often false positives occur and at what peptide length these spurious hits may be eliminated using standard bioinformatics techniques. Thus, by determining false positive levels and analyzing the minimal number of sequenced amino acids necessary from a peptide it may be possible to uniquely identify the peptide from a group of homologs or other related biological organisms.
Example 2: N-terminal Amino Acid Aminopeptidase-based Sequencing Mutant Aminopeptidase Generation.
[00107] Nucleic acid molecules encoding aminopeptidases from Escherichia coli may be cloned into the Invitrogen Gateway system for analysis and expression. Such peptidases include leucine aminopeptidase (PepA), methionine aminopeptidase (MAP), and the aminopeptidase ypdE (ypdE). The peptidases may initially be expressed in E. coli with 6xHis Tags, added via the gateway system, to facilitate purification. The aminopeptidase activity may be verified with L-amino-acids-4- nitroanilide and different rate constants for the various L-amino acids may be determined. A fluorescent version of the amino acid cleavage reagent may be made, for example, using Rhodamine 1 10, which becomes non fluorescent when amino acids are attached to its two free amines. The wild-type peptidases' response to this compound may be determined. [00108] Mutagenesis of the aminopeptidases may be performed via both site directed mutagenesis and random mutagenesis in combination and separately. With site directed mutagenesis, PCR primers are generared that cover the reaction sites of the proteases. These primers may be used to mutate specific amino acids in the active site to different amino acids. These mutants may be referred to as SDAPMl. Random mutagenesis may be performed using, for example the Mutazyme random mutagenesis kit (Stratagene) or the Mn+2 PCR method (Leung, D. W., et al., Technique, (1989) 1 :11-15). Mutation rates of 1 per 10000 bases may be used so that most mutants contain 1 - 2 mutations per PCR product. These mutants may be referred to as RMAP. These two pools of mutants may be mixed for the screening and selection process.
[00109] The in vitro compartmentalization methodology may be used for screening and selection of mutated sequences having altered peptidase activity. A water/oil emulsion is generated using mineral oil, Span-80, Tween-80, and an in vitro transcription & translation kit and an appropriate Rhodamine 110 conjugate. The mutant pool is then dispersed in the approximately 3 um microdroplets to limit the number of microdroplets that contain more than 1 mutant. The emulsion is then heated at 37°C to synthesize the mutant aminopeptidases in each compartment and allow the peptidase to digest the screening reagent. Microdroplets that contained fluorescence may be isolated by flow cytometry in a 96 well plates. The isolated mutants may then be tested again for the ability to digest the target molecule and for general specificity to other amino acids. Mutants having acceptable characteristics, those that digest the target with the highest kinetics while being the most specific to the target molecule, may be used for the next round of random mutagenesis. Each round of selection and screening should produce mutant aminopeptidases with higher kinetics and specificity. Generally this process should take 3 - 7 rounds of mutant generation and screening. [00110] Initially, a mutant that recognizes only biotinylated N-terminal amino acid(s) and a broad specificity to most or all amino acids should be isolated. This mutant would then be used to generate other mutants that have high specificity for individual or small groups of biotinylated amino acids. Individual peptidases to all 20 amino acids are not necessary for the proteases sequencing strategy to be effective. The number of residues sequenced from a peptide and the size of the searched database will determine how many specific aminopeptidase are needed. Table 2 may be used to determine the number of aminopeptidases may be needed for sequencing as well.
Aminopeptidase based sequencing
[00111] The aminopeptidase based sequencing follows a logical progression of five experimental steps (Figure 3); some of these steps are repeated. Generally, the first step is to prepare the sample and conjugate the peptides to a microscope slide. The second step involves the biotinylation of the N-terminus of each conjugated peptide. This may require that cysteine and lysine side chains be blocked to prevent biotin labeling. The third step detects the presence of the biotin with a fluorescently labeled protein such as streptavidin or an antibody. The fourth step adds an aminopeptidase specific for a biotinylated amino acid(s). Steps 3 and 4 are repeated for each aminopeptidase with a final biotin detection step at the end. These two steps eliminate an N-terminal amino acid(s) from each peptide on the array and detect which aminopeptidase removed the amino acid(s). Steps 2 through 4 are repeated between about 4 to 15 times. The fifth step analyzes the sequential image files to determine which amino acid(s) is at or most probably at each position of the sequenced peptides. This process allows for both the identification (sequencing) and quantitation (counting the number times each peptide was found on the slide) of each peptide.
[00112] In the first step, a protein sample is prepared for analysis by proteolytic digestion followed by desalting. Tissue extracted from an adult rat is homogenized on ice for 15 in Lysis Buffer (25 mM Tris pH 8.0, 5 mM DTT, 8 M Urea, 100 mM NaCl). The lysate is centrifuged at 15,000 x g for 10 minutes. The sample is then heated to 37° C for 1 hour to reduce disulfide bonds. Finally, the sample is diluted 7: 1 with Digestion Buffer (25 mM Tris pH 7.6). Sequencing grade trypsin is added to the sample at a ratio of 50: 1 for sample to trypsin. The reaction is then incubated overnight at 37° C. The next morning the sample is desalted with a SepPak
(Millipore) following the manufacturer's instructions. The sample is then lyophilized to a dry powder and resuspended in 80% pyridine. A slide is coated with phenyl- biisothiocyanate such that one of the isothiocyanate groups is covalently linked to the slide. This group will react with both lysines and the N-termini of the peptides. The sample is then placed on the slide and allowed to react 2 hours at room temperature. The N-terminal amino acid is then cleaved from the peptides by the addition of 95% TFA. This reaction will leave the peptides bound by the isothiocyanate attached to a lysine. Non-lysine containing peptides will not be bound. Other reaction chemistries can be used to attach the peptides to the slide such as a methyl methanethiosulfonate group to react with sulhydrides or p-hydroxyphenylglyoxal group to react with arginines. The slide is then washed in 0.1 M ethanolamine 25 mM Tris pH 8.0 to react with the remaining isothiocyanate groups. The slide should then be washed three times for 1 minute in 50 mM phosphate buffer pH 8.5. [00113] The next step in the aminopeptidase-based sequencing is to add biotin to all free amines, which includes the amine termini of the peptides. This reaction is accomplished by activating the carboxyl group of biotin with sulfo-NHS and EDC (1- Ethyl-3-[3-dimethylaminopropyl]carbodiimide Hydrochloride) in 50 mM MES buffer at pH 4.5. The pH is then adjusted to 8.5 and the sulfo-NHS-biotin is added to the peptide conjugated slide. This reaction is incubated at 50 C for 30 minutes. The residual unreacted biotin is then washed away using a buffer of 25 mM Tris pH 8.0. [00114] The third step in the aminopeptidase-based sequencing is to incubate the microscope slide with fluorescently labeled streptavidin and acquire digital images of the areas showing where the streptavidin binds. After removal of the excess biotin reagent, a solution of 20 microgram/mL Cy3 labeled streptavidin in S-Buffer (25 mM Tris pH 8.0 10OmM NaCl .1% Tween-20) is incubated on the slide for 15 minutes at 25° C. The excess streptavidin is then washed off the slide with three x 1 minute washes with S-Buffer. A digital image of 1 or more areas of the slide is taken with a motorized fluorescent microscope system (e.g. total internal reflectance fluorescence, TIRF). To help orient the images, a different fluorescent label can be conjugated to the slide in a unique pattern to aid in orienting the images before analysis. [00115] The fourth step is to digest the N-terminal biotinylated amino acids from the peptides on the slide. The first of the specific biotin aminopeptidases will be incubated on the slide for about 10 minutes at 37° C. The exact time that each individual aminopeptidase mutant is incubated on the slide will depend on the reaction kinetics of that particular mutant. Step three is repeated, and the next in the series of aminopeptidase mutants is then incubated on the slide, followed again by step three. This cycle will continue to be repeated until all the aminopeptidase mutants have been used. The exact order of use of the mutants will in part depend on their specificity and the logical progression of the aminopeptidases from the most specific to the least specific. The last aminopeptidase should be one that remove all remaining N-terminal biotinylated amino acids or sequencing of those peptides that still have a biotin attached stops. [00116] Steps 2 through 4 are repeated between 4 - 20 times, and each cycle identifies the current N-terminal amino acid for each peptide and then removes that amino acid. This process is how the sequence information and identity of each peptide is determined. The exact number of residues that need to be sequenced is dependant on the number and specificity of the aminopeptidases and the complexity of the sample and database searched. Bioinformatics simulations for the aptamer based sequencing method apply here as well and provide an estimate of the number of residues that will need to be sequenced. [00117] The fifth step may be automated by computer software and image analysis algorithms but may also be performed manually as follows. Each set of images from the cycle of steps 2 - 4 is a position in the peptide. The first cycle represents the second residue from the N-terminal end of the original tryptic peptide as the first amino acid was removed during the isothiocyanate based attachment step. Each subsequent cycle would represent the amino acid in the peptide as 1 + the current cycle number. So cycle 5 would represent the sixth residue in the peptides. The first fluorescent image shows which peptides currently have an attached biotin and what their relative positions are on the slide. The images before and after each aminopeptidase cycle demonstrate whether the biotin labeled amino acid was cleaved by the current peptidase. For example, if the image before the aminopeptidase addition had the biotin still attached at specific position on the slide and the image after incubation with aminopeptidase shows loss of that fluorescent signal, then it is expected that the amino acid(s) recognized by that aminopeptidase are at the current residue position. The potential amino acids determined by aminopeptidase treat for each cycle are record based on the X and Y position within the image corrected by any slight offset caused by the motorized stage movement with the patterned stationary fluorophore. After the sequence information at each position for each cycle is compiled, the sequence, even if only partial, can be used to search a database of theoretical tryptic peptides for the best match. The tryptic database is built based on the organism(s) that the user wishes to search by taking all the known proteins in that organism(s) proteome and calculating an in silico tryptic digest of those proteins.
Example 3: Identification of Peptides Using Peptide Sequences [00118] A bioinformatics simulation was performed to predict the amount of peptide sequence information that may be needed to identify a peptide. The study was designed to address the question in terms of the specific sequence information which all the amino acids may or may not be able to be distinguished by an aptamer based recognition approach. These calculations included analysis of problems that could arise during sequencing that may or may not be specific to the CIMPS methodology. In addition, the question of potential false positive rates at both the peptide and organism level were also taken into consideration. Finally, the ability for the methods to differentiate closely related species of bacteria was also examined. [00119] The code to run the bioinformatics simulations was written in the C programming language using the LabWindows/CVI programming environment. Briefly, the fasta format protein database for a given bacteria or group of organisms was downloaded from NCBI. The database was then digested in silico with trypsin to generate a list of peptides of at least 4 amino acids in length. For multiple organism databases, all peptides that were present multiple times in the data set were reduced to just one instance of that sequence. The sequence was then copied to a adjoining text. To search this peptide database, each peptide in the database was searched against the whole database using various parameters to see if the peptide was unique under a given set of condition. The parameters possible included different read lengths, different combinations of recognizable amino acids, combinations of amino acids that could not be distinguished from each other, and/or forcing each peptide to have a single missed aptamer in the sequence, a single missed phenylisothiocyante (PITC) addition in the sequence, or a single missed PITC cleavage in the sequence. All of these results were saved as Tab delimited files and plotted using Microsoft Excel. [00120] To calculate false positives (FP) rates for the different errors that could occur during the concurrent identification of multiple peptide methods, all input peptides were assumed to have the possibility of sequencing error. Thus, the false positive rate for a given error (FPEΓΓOΓ) is
FPEΓΓOΓ = Number of Misidentified Peptides Number of Peptides Searched
[00121] To convert this number into the false positive rate for any one peptide (FPlpep) is given by:
FPipep = FPΓOΓ * Calculate Number of Time Errors Occurs
[00122] When multiple versions of that unique peptide sequence are present then the false positive (FPunique) is given by:
FPunique = FPipep Λ The_Number_of_Times_That_Sequence_Is_IDed
[00123] To calculate the false positive rate for identifying a given organism (FPorganism) is given by: FPorganism = FPlpep Λ Total JNIumber_of_Peptides_From_Organism_Identified
[00124] These false positive rates are based solely on the expected sequencing error rates and current do not take in to account total number of peptides sequenced. This addition parameter will be added in the future and will need to b estimate from other data.
[00125] The organisms used for the simulations to create the databases of peptides are as follows. For the single organism study, Escherichia coli strain K- 12 was used to create the peptide database and will be referred to as E. coli. For the multiorganism simulations, the following organisms were used:
Bacillus anthracis (Ames)
Bacillus cereus (ATCC 10987)
Bacillus thuringiensh (konkukian)
Brucella suis (1330) Burkholderia pseudomallei (1106a)
Francisella tularensis (FSC 198)
Salmonella typhi
Shigella dysenteriae
Vibrio cholerae Yersinia pestis (KIM)
[00126] For the simulations of effect of very similar bacteria on methods, two strains of Bacillus anthracis and Bacillus cereus were used.
Bacillus anthracis (Ames) Bacillus anthracis (Sterne)
Bacillus cereus (ATCC 10987) Bacillus cereus (ZK)
[00127] The results of these methods identify amino acid information content, multiple aptamers, multiple aptamers at different read lengths, aptamers to multiple amino acids, false positive (FPpep) rate for different methods, false positive (FPpop) rates, multiple aptamers at 10 res, multiple aptamers at different read lengths, and other false positive rates.
[00128] Other embodiments within the scope of the claims herein will be apparent to one skilled in the art from consideration of the specification or practice of the invention as disclosed herein. It is intended that the specification, together with the examples, be considered to be exemplary only, with the scope and spirit of the invention being indicated by the claims.

Claims

WE CLAIM:
1. A process for simultaneously identifying N-terminal amino acids of two or more polypeptides, the process comprising:
(i) forming complexes between N-terminal amino acid moieties of two or more polypeptides with complexing agents capable of binding to the N- terminal amino acid moieties, the complexing agents having a detectable label; and
(ii) detecting the detectable labels in the complexes in order to identify the N- terminal amino acids of the polypeptides.
2. The process of claim 1 , wherein the complexing agents are selected from the group consisting of aptamers, antibodies, peptides, proteins, DNA, RNA, PNA, GNA, TNA and peptoids.
3. The process of claim 1 , wherein the complexing agents are specific to an N- terminal amino acid moiety.
4. The process of claim 1, wherein the complexing agents are aptamers.
5. The process of claim 4, wherein the aptamers areribonucleic acid oligomers.
6. The process of claim 4, wherein the aptamers are deoxyribonucleic acid oligomers.
7. The process of claim 1, wherein the labels are selected from the group consisting of radiolabels, chemiluminescent labels, luminescent labels, fluorophores, quantum dots, electrochemical tags, colorimetric labels, or colloidal metal particles.
8. The process of claim 1, wherein the step of detecting the complexes comprises fluorescence microscopy.
9. The process of claim 8, wherein the step of detecting the complexes comprises total internal reflectance microscopy.
10. The process of claim 1 , wherein the polypeptides are bound to a surface.
11. The process of claim 1 , wherein the N-terminal amino acid moieties comprise a phenylthiocarbamoyl group.
12. The process of claim 1, wherein the N-terminal amino acid moieties comprise a post-translationally or chemically modified amino acid.
13. A process for determining at least a portion of the amino acid sequence of a polypeptide of interest, the process comprising the steps of:
(i) forming a complex between a N-terminal amino acid moiety of the polypeptide and a complexing agent having specificity for the N-terminal amino acid moiety of the polypeptide, the complexing agent selected from a set of complexing agents, wherein each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection;
(ii) detecting the detectable label in the complex in order to identify the N- terminal amino acid moiety of the polypeptide;
(iii) removing the N-terminal amino acid moiety from the polypeptide; and
(iv) repeating steps (i) and (ii) in order to determine at least a portion of the amino acid sequence of the polypeptide of interest.
14. The process of claim 13, wherein the steps (i) through (iii) are repeated.
15. The process of claim 13, wherein step (iii) further comprises removing the complexing agent.
16. The process of claim 13, wherein the polypeptide is bound to a surface.
17. The process of claim 13, wherein each of the complexing agents of the set is specific for a different N-terminal amino acid moiety.
18. The process of claim 13, wherein the complexing agent in the complex formed in step (i) is selected from the group consisting of aptamer, antibody, peptide, protein, DNA, RNA, PNA, GNA, TNA and peptoid.
19. The process of claim 18, wherein the aptamer is a ribonucleic acid oligomer.
20. The process of claim 18, wherein the aptamer is a deoxyribonucleic acid oligomer.
21. The process of claim 13, wherein the label is a radiolabel, chemiluminescent, luminescent, fluorophore, quantum dot, electrochemical tag, colorimetric, or colloidal metal particle.
22. The process of claim 13, wherein step (ii) comprises fluorescence microscopy.
23. The process of claim 22, wherein step (ii) comprises total internal reflectance microscopy.
24. The process of claim 13, wherein the N-terminal amino acid moiety is a phenylthiocarbamoyl group.
25. The process of claim 13, wherein step (iii) comprises contacting the polypeptide with a N-terminal amino acid removing reagent.
26. The process of claim 13, wherein step (iii) comprises comprises Edman degradation.
27. The process of claim 13, wherein the N-terminal amino acid moiety is a post- translationally modified amino acid.
28. The process of claim 16, wherein the surface is an array.
29. A process for determining at least a portion of amino acid sequence of a plurality of polypeptides in a sample, the process comprising the steps of:
(i) bonding at least some of the plurality of polypeptides of the sample, each at a specific location on a surface;
(ii) contacting the surface with one or more complexing agents, the complexing agents selected from a set of complexing agents, wherein each complexing agent in the set (a) has specificity for a different N-terminal amino acid moiety or subset of N-terminal amino acid moieties and (b) comprises a detectable label that is distinguishable from the detectable labels of the other complexing agents in the set upon detection;
(iii) forming complexes between N-terminal amino acid moieties of the at least some of the polypeptides and a complexing agent from the set of complexing agents;
(iv) detecting the detectable label in the complexes of step (iii) at specific locations on the surface in order to identify the N-terminal amino acid moiety at the specific locations on the surface;
(v) removing the N-terminal amino acid moiety from the polypeptides at the specific locations on the surface; and (vi) repeating steps (iii) through (iv) in order to determine at least a portion of the amino acid sequence of the at least some of the polypeptides at the specific locations on the surface.
30. The process of claim 29, wherein steps (iii) through (v) are repeated.
31. The process of claim 29, wherein each of the complexing agents of the set is specific for a different N-terminal amino acid moiety.
32. The process of claim 29, wherein the polypeptide is a protein.
33. The process of claim 29, wherein the polypeptide is digested.
34. The process of claim 29, further comprising the step of identifying the polypeptide of the sample bound at a specific location on the surface by correlating at least a portion of the amino acid sequence at the specific location with known sequences by performing database searching.
35. The process of claim 29, further comprising determining the proportion of the amount of polypeptide on the surface to the total amount of polypeptide present in the sample.
36. The process of claim 35, further comprising determining the amount of the polypeptide on the surface.
37. The process of claim 29, wherein the C-terminus of the polypeptide is bound to the surface.
38. The process of claim 29, wherein the step (v) comprises contacting the polypeptide with a N-terminal amino acid removing reagent.
39. The process of claim 29, wherein the step (v) comprises Edman degradation.
40. The process of claim 29, wherein the complexing agent in the complex formed in step (iii) is selected from the group consisting of aptamer, antibody, peptide, protein, DNA, RNA, PNA, GNA, TNA and peptoid.
41. The process of claim 40, wherein the aptamer is a ribonucleic acid oligomer.
42. The process of claim 40, wherein the aptamer is a deoxyribonucleic acid oligomer.
43. The process of claim 29, wherein the detectable label is a radiolabel, fluorophore or colloidal metal particle.
44. The process of claim 29, wherein step (iv) comprises fluorescence microscopy.
45. The process of claim 44, wherein step (iv) comprises total internal reflectance microscopy.
46. The process of claim 29, wherein the N-terminal amino acid moiety is a phenylthiocarbamoyl group.
47. The process of claim 29, wherein the N-terminal amino acid moiety is a post- translationally modified amino acid.
48. The process of claim 29, wherein step (v) further comprises removing the complexing agent.
PCT/US2009/065086 2008-12-01 2009-11-19 Concurrent identification of multitudes of polypeptides Ceased WO2010065322A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA2745197A CA2745197A1 (en) 2008-12-01 2009-11-19 Concurrent identification of multitudes of polypeptides

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11881808P 2008-12-01 2008-12-01
US61/118,818 2008-12-01

Publications (1)

Publication Number Publication Date
WO2010065322A1 true WO2010065322A1 (en) 2010-06-10

Family

ID=42233554

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/065086 Ceased WO2010065322A1 (en) 2008-12-01 2009-11-19 Concurrent identification of multitudes of polypeptides

Country Status (2)

Country Link
CA (1) CA2745197A1 (en)
WO (1) WO2010065322A1 (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012178023A1 (en) * 2011-06-23 2012-12-27 Board Of Regents, The University Of Texas System Identifying peptides at the single molecule level
WO2013112745A1 (en) * 2012-01-24 2013-08-01 The Regents Of The University Of Colorado, A Body Corporate Peptide identification and sequencing by single-molecule detection of peptides undergoing degradation
WO2019089836A1 (en) 2017-10-31 2019-05-09 Encodia, Inc. Kits for analysis using nucleic acid encoding and/or label
WO2020102741A1 (en) * 2018-11-15 2020-05-22 Quantum-Si Incorporated Methods and compositions for protein sequencing
WO2020198264A1 (en) * 2019-03-26 2020-10-01 Encodia, Inc. Modified cleavases, uses thereof and related kits
US10975421B2 (en) 2004-02-02 2021-04-13 Quantum-Si Incorporated Enrichment of nucleic acid targets
WO2021086908A1 (en) * 2019-10-28 2021-05-06 Quantum-Si Incorporated Methods, kits and devices of preparing samples for multiplex polypeptide sequencing
WO2021086918A1 (en) * 2019-10-28 2021-05-06 Quantum-Si Incorporated Methods of single-polypeptide sequencing and reconstruction
US11001875B2 (en) 2015-05-20 2021-05-11 Quantum-Si Incorporated Methods for nucleic acid sequencing
US20210139973A1 (en) * 2019-10-28 2021-05-13 Quantum-Si Incorporated Methods of single-cell polypeptide sequencing
WO2021141924A1 (en) 2020-01-07 2021-07-15 Encodia, Inc. Methods for stable complex formation and related kits
US11130986B2 (en) 2015-05-20 2021-09-28 Quantum-Si Incorporated Method for isolating target nucleic acid using heteroduplex binding proteins
WO2021194908A1 (en) 2020-03-24 2021-09-30 Encodia, Inc. Modified dipeptide cleavases, uses thereof and related kits
US11162952B2 (en) 2014-09-15 2021-11-02 Board Of Regents, The University Of Texas System Single molecule peptide sequencing
US11224878B2 (en) 2017-05-05 2022-01-18 Quantum-Si Incorporated Substrates having modified surface reactivity and antifouling properties in biological reactions
US11312944B2 (en) 2016-12-19 2022-04-26 Quantum-Si Incorporated Polymerizing enzymes for sequencing reactions
US11358981B2 (en) 2020-01-21 2022-06-14 Quantum-Si Incorporated Compounds and methods for selective c-terminal labeling
US11427814B2 (en) 2019-03-26 2022-08-30 Encodia, Inc. Modified cleavases, uses thereof and related kits
US11435358B2 (en) 2011-06-23 2022-09-06 Board Of Regents, The University Of Texas System Single molecule peptide sequencing
US11573238B2 (en) 2017-09-28 2023-02-07 Vib Vzw Means and methods for single molecule peptide sequencing
US11613772B2 (en) 2019-01-23 2023-03-28 Quantum-Si Incorporated High intensity labeled reactant compositions and methods for sequencing
US11634709B2 (en) 2019-04-30 2023-04-25 Encodia, Inc. Methods for preparing analytes and related kits
US11655504B2 (en) 2017-07-24 2023-05-23 Quantum-Si Incorporated High intensity labeled reactant compositions and methods for sequencing
EP4196581A1 (en) 2020-08-19 2023-06-21 Encodia, Inc. Sequential encoding methods and related kits
US11712715B2 (en) 2019-10-11 2023-08-01 Quantum-Si Incorporated Surface modification in the vapor phase
US11802878B2 (en) 2009-09-25 2023-10-31 The Governing Council Of The University Of Toronto Protein sequencing method and reagents
EP4299803A2 (en) 2016-05-02 2024-01-03 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding
US11959105B2 (en) 2019-06-28 2024-04-16 Quantum-Si Incorporated Polymerizing enzymes for sequencing reactions
US12011716B2 (en) 2019-10-29 2024-06-18 Quantum-Si Incorporated Peristaltic pumping of fluids and associated methods, systems, and devices
US12065466B2 (en) 2020-05-20 2024-08-20 Quantum-Si Incorporated Methods and compositions for protein sequencing
US12129463B2 (en) 2017-10-31 2024-10-29 Encodia, Inc. Methods and kits using nucleic acid encoding and/or label
US12188940B2 (en) 2022-07-12 2025-01-07 Abrus Bio, Inc. Determination of protein information by recoding amino acid polymers into DNA polymers
US12196760B2 (en) 2018-07-12 2025-01-14 Board Of Regents, The University Of Texas System Molecular neighborhood detection by oligonucleotides
US12312377B2 (en) 2018-07-13 2025-05-27 Quantum-Si Incorporated Biconjugatable labels and methods of use
US12474347B2 (en) 2022-08-19 2025-11-18 Abrus Bio, Inc. Determination of protein information by recoding amino acid polymers into DNA polymers
US12498379B2 (en) 2018-10-05 2025-12-16 Board Of Regents, The University Of Texas System Solid-phase N-terminal peptide capture and release

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11268963B2 (en) 2015-10-16 2022-03-08 The Governing Council Of The University Of Toronto Protein sequencing methods and reagents

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030138831A1 (en) * 1999-05-25 2003-07-24 Praelux Incorporated Method for sequencing and characterizing polymeric biomolecules using aptamers and a method for producing aptamers
US20050164264A1 (en) * 2000-08-10 2005-07-28 Nanobiodynamcis Method and system for rapid biomolecular recognition of amino acids and protein sequencing
US20080015117A1 (en) * 2006-07-14 2008-01-17 Ozone Research Frontier Ltd. Reactor for automated protein analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030138831A1 (en) * 1999-05-25 2003-07-24 Praelux Incorporated Method for sequencing and characterizing polymeric biomolecules using aptamers and a method for producing aptamers
US20050164264A1 (en) * 2000-08-10 2005-07-28 Nanobiodynamcis Method and system for rapid biomolecular recognition of amino acids and protein sequencing
US20080015117A1 (en) * 2006-07-14 2008-01-17 Ozone Research Frontier Ltd. Reactor for automated protein analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SEO ET AL.: "Four-color DNA sequencing by synthesis on a chip using photocleavable fluorescent nucteotides.", PROC. NATL. ACAD. SCI. USA., vol. 102, no. 17, 26 April 2006 (2006-04-26), pages 5926 - 5931 *

Cited By (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11795497B2 (en) 2004-02-02 2023-10-24 Quantum-Si Incorporated Enrichment of nucleic acid targets
US10975421B2 (en) 2004-02-02 2021-04-13 Quantum-Si Incorporated Enrichment of nucleic acid targets
US11802878B2 (en) 2009-09-25 2023-10-31 The Governing Council Of The University Of Toronto Protein sequencing method and reagents
GB2577626B (en) * 2011-06-23 2020-09-23 Univ Texas Identifying peptides at the single molecule level
GB2510488A (en) * 2011-06-23 2014-08-06 Univ Texas Identifying peptides at the single molecule level
GB2577626A (en) * 2011-06-23 2020-04-01 Univ Texas Identifying peptides at the single molecule level
GB2510488B (en) * 2011-06-23 2020-04-08 Univ Texas Identifying peptides at the single molecule level
DE112012002570B4 (en) 2011-06-23 2023-11-23 Board Of Regents, The University Of Texas System Identifying peptides at the single molecule level
US11105812B2 (en) 2011-06-23 2021-08-31 Board Of Regents, The University Of Texas System Identifying peptides at the single molecule level
WO2012178023A1 (en) * 2011-06-23 2012-12-27 Board Of Regents, The University Of Texas System Identifying peptides at the single molecule level
US9625469B2 (en) 2011-06-23 2017-04-18 Board Of Regents, The University Of Texas System Identifying peptides at the single molecule level
US12379381B2 (en) 2011-06-23 2025-08-05 Board Of Regents, The University Of Texas System Single molecule peptide sequencing
US11435358B2 (en) 2011-06-23 2022-09-06 Board Of Regents, The University Of Texas System Single molecule peptide sequencing
WO2013112745A1 (en) * 2012-01-24 2013-08-01 The Regents Of The University Of Colorado, A Body Corporate Peptide identification and sequencing by single-molecule detection of peptides undergoing degradation
US11162952B2 (en) 2014-09-15 2021-11-02 Board Of Regents, The University Of Texas System Single molecule peptide sequencing
US11001875B2 (en) 2015-05-20 2021-05-11 Quantum-Si Incorporated Methods for nucleic acid sequencing
US11970729B2 (en) 2015-05-20 2024-04-30 Quantum-Si Incorporated Methods for nucleic acid sequencing
US11898196B2 (en) 2015-05-20 2024-02-13 Quantum-Si Incorporated Method for isolating target nucleic acid using heteroduplex binding proteins
US11130986B2 (en) 2015-05-20 2021-09-28 Quantum-Si Incorporated Method for isolating target nucleic acid using heteroduplex binding proteins
US12123878B2 (en) 2016-05-02 2024-10-22 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding
EP4299803A2 (en) 2016-05-02 2024-01-03 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding
US12517133B2 (en) 2016-05-02 2026-01-06 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding
US12019077B2 (en) 2016-05-02 2024-06-25 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding
US12320813B2 (en) 2016-05-02 2025-06-03 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding
US12019078B2 (en) 2016-05-02 2024-06-25 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding
US12235276B2 (en) 2016-05-02 2025-02-25 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding
US11959922B2 (en) 2016-05-02 2024-04-16 Encodia, Inc. Macromolecule analysis employing nucleic acid encoding
US11312944B2 (en) 2016-12-19 2022-04-26 Quantum-Si Incorporated Polymerizing enzymes for sequencing reactions
US11224878B2 (en) 2017-05-05 2022-01-18 Quantum-Si Incorporated Substrates having modified surface reactivity and antifouling properties in biological reactions
US12528083B2 (en) 2017-05-05 2026-01-20 Quantum-Si Incorporated Substrates having modified surface reactivity and antifouling properties in biological reactions
US11655504B2 (en) 2017-07-24 2023-05-23 Quantum-Si Incorporated High intensity labeled reactant compositions and methods for sequencing
US12019076B2 (en) 2017-09-28 2024-06-25 Vib Vzw Means and methods for single molecule peptide sequencing
US12422438B2 (en) 2017-09-28 2025-09-23 Vib Vzw Means and methods for single molecule peptide sequencing
US11573238B2 (en) 2017-09-28 2023-02-07 Vib Vzw Means and methods for single molecule peptide sequencing
US12129463B2 (en) 2017-10-31 2024-10-29 Encodia, Inc. Methods and kits using nucleic acid encoding and/or label
US11782062B2 (en) 2017-10-31 2023-10-10 Encodia, Inc. Kits for analysis using nucleic acid encoding and/or label
WO2019089836A1 (en) 2017-10-31 2019-05-09 Encodia, Inc. Kits for analysis using nucleic acid encoding and/or label
US12130291B2 (en) 2017-10-31 2024-10-29 Encodia, Inc. Kits for analysis using nucleic acid encoding and/or label
US12467049B2 (en) 2017-10-31 2025-11-11 Encodia, Inc. Methods and kits using nucleic acid encoding and/or label
US11513126B2 (en) 2017-10-31 2022-11-29 Encodia, Inc. Kits for analysis using nucleic acid encoding and/or label
US12292446B2 (en) 2017-10-31 2025-05-06 Encodia, Inc. Kits for analysis using nucleic acid encoding and/or label
US12196760B2 (en) 2018-07-12 2025-01-14 Board Of Regents, The University Of Texas System Molecular neighborhood detection by oligonucleotides
US12312377B2 (en) 2018-07-13 2025-05-27 Quantum-Si Incorporated Biconjugatable labels and methods of use
US12498379B2 (en) 2018-10-05 2025-12-16 Board Of Regents, The University Of Texas System Solid-phase N-terminal peptide capture and release
US12360114B2 (en) 2018-11-15 2025-07-15 Quantum-Si Incorporated Methods and compositions for protein sequencing
US11959920B2 (en) 2018-11-15 2024-04-16 Quantum-Si Incorporated Methods and compositions for protein sequencing
CN113287020A (en) * 2018-11-15 2021-08-20 宽腾矽公司 Methods and compositions for protein sequencing
US12000835B2 (en) 2018-11-15 2024-06-04 Quantum-Si Incorporated Methods and compositions for protein sequencing
US12259391B2 (en) 2018-11-15 2025-03-25 Quantum-Si Incorporated Methods and compositions for protein sequencing
US12174196B2 (en) 2018-11-15 2024-12-24 Quantum-Si Incorporated Methods and compositions for protein sequencing
WO2020102741A1 (en) * 2018-11-15 2020-05-22 Quantum-Si Incorporated Methods and compositions for protein sequencing
US12055548B2 (en) 2018-11-15 2024-08-06 Quantum-Si Incorporated Methods and compositions for protein sequencing
US11613772B2 (en) 2019-01-23 2023-03-28 Quantum-Si Incorporated High intensity labeled reactant compositions and methods for sequencing
US11427814B2 (en) 2019-03-26 2022-08-30 Encodia, Inc. Modified cleavases, uses thereof and related kits
US11788080B2 (en) 2019-03-26 2023-10-17 Encodia, Inc. Modified cleavases, uses thereof and related kits
EP3947667A4 (en) * 2019-03-26 2022-12-28 Encodia, Inc. Modified cleavases, uses thereof and related kits
WO2020198264A1 (en) * 2019-03-26 2020-10-01 Encodia, Inc. Modified cleavases, uses thereof and related kits
AU2020247918B2 (en) * 2019-03-26 2022-06-30 Encodia, Inc. Modified cleavases, uses thereof and related kits
US11634709B2 (en) 2019-04-30 2023-04-25 Encodia, Inc. Methods for preparing analytes and related kits
US11959105B2 (en) 2019-06-28 2024-04-16 Quantum-Si Incorporated Polymerizing enzymes for sequencing reactions
US11712715B2 (en) 2019-10-11 2023-08-01 Quantum-Si Incorporated Surface modification in the vapor phase
WO2021086908A1 (en) * 2019-10-28 2021-05-06 Quantum-Si Incorporated Methods, kits and devices of preparing samples for multiplex polypeptide sequencing
CN114929887A (en) * 2019-10-28 2022-08-19 宽腾矽公司 Method for sequencing and reconstructing single polypeptide
WO2021086918A1 (en) * 2019-10-28 2021-05-06 Quantum-Si Incorporated Methods of single-polypeptide sequencing and reconstruction
US20210139973A1 (en) * 2019-10-28 2021-05-13 Quantum-Si Incorporated Methods of single-cell polypeptide sequencing
US12011716B2 (en) 2019-10-29 2024-06-18 Quantum-Si Incorporated Peristaltic pumping of fluids and associated methods, systems, and devices
WO2021141924A1 (en) 2020-01-07 2021-07-15 Encodia, Inc. Methods for stable complex formation and related kits
US11358981B2 (en) 2020-01-21 2022-06-14 Quantum-Si Incorporated Compounds and methods for selective c-terminal labeling
WO2021194908A1 (en) 2020-03-24 2021-09-30 Encodia, Inc. Modified dipeptide cleavases, uses thereof and related kits
US12065466B2 (en) 2020-05-20 2024-08-20 Quantum-Si Incorporated Methods and compositions for protein sequencing
EP4196581A1 (en) 2020-08-19 2023-06-21 Encodia, Inc. Sequential encoding methods and related kits
US12188940B2 (en) 2022-07-12 2025-01-07 Abrus Bio, Inc. Determination of protein information by recoding amino acid polymers into DNA polymers
US12474347B2 (en) 2022-08-19 2025-11-18 Abrus Bio, Inc. Determination of protein information by recoding amino acid polymers into DNA polymers

Also Published As

Publication number Publication date
CA2745197A1 (en) 2010-06-10

Similar Documents

Publication Publication Date Title
WO2010065322A1 (en) Concurrent identification of multitudes of polypeptides
US12379380B2 (en) Single-molecule protein and peptide sequencing
US20240302380A1 (en) Single molecule peptide sequencing
US9970932B2 (en) Non-covalent patterned chemical features and use thereof in MALDI-based quality control
US20180372752A1 (en) Protein sequencing method and reagents
Hu et al. Functional protein microarray technology
US20230107647A1 (en) Single molecule peptide sequencing
US20100331200A1 (en) Post translational modification pattern analysis
AU2007231009A1 (en) Protein isoform discrimination and quantitative measurements thereof
JP5350215B2 (en) Method for detecting and / or concentrating analyte proteins and / or analyte peptides in complex protein mixtures
US20230104998A1 (en) Single-molecule protein and peptide sequencing
CN119790193A (en) Determining protein information by recoding amino acid polymers into DNA polymers
EP3384041B1 (en) Method for identification of protease substrates
CN114127281A (en) Proximity interaction analysis
US20210381036A1 (en) Methods and composition for high throughput single molecule protein detection systems
US12416100B2 (en) Devices and methods for display of encoded peptides, polypeptides, and proteins on DNA
US20020106700A1 (en) Method for analyzing proteins

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09830863

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2745197

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09830863

Country of ref document: EP

Kind code of ref document: A1