[go: up one dir, main page]

WO2003046577A1 - Systeme et procede de sequencage automatique de proteines par spectrometrie de masse - Google Patents

Systeme et procede de sequencage automatique de proteines par spectrometrie de masse Download PDF

Info

Publication number
WO2003046577A1
WO2003046577A1 PCT/EP2001/014041 EP0114041W WO03046577A1 WO 2003046577 A1 WO2003046577 A1 WO 2003046577A1 EP 0114041 W EP0114041 W EP 0114041W WO 03046577 A1 WO03046577 A1 WO 03046577A1
Authority
WO
WIPO (PCT)
Prior art keywords
peak
mass
peptide
isotopic
spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2001/014041
Other languages
English (en)
Inventor
Matthias Wilm
Gitte Jackie Neubauer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Europaisches Laboratorium fuer Molekularbiologie EMBL
Original Assignee
Europaisches Laboratorium fuer Molekularbiologie EMBL
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Europaisches Laboratorium fuer Molekularbiologie EMBL filed Critical Europaisches Laboratorium fuer Molekularbiologie EMBL
Priority to AU2002218321A priority Critical patent/AU2002218321A1/en
Priority to CA002468689A priority patent/CA2468689A1/fr
Priority to PCT/EP2001/014041 priority patent/WO2003046577A1/fr
Priority to JP2003547965A priority patent/JP2005510732A/ja
Publication of WO2003046577A1 publication Critical patent/WO2003046577A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/02Details
    • H01J49/04Arrangements for introducing or extracting samples to be analysed, e.g. vacuum locks; Arrangements for external adjustment of electron- or ion-optical components
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2458/00Labels used in chemical analysis of biological material
    • G01N2458/15Non-radioactive isotope labels, e.g. for detection by mass spectrometry

Definitions

  • the present invention relates generally to a computer implemented method of determining the amino acid sequence of a protein by automatic interpretation of mass spectra of isotopically-labeled C-terminal peptide fragments of the protein.
  • BACKGROUND The linear arrangement of amino acids in a protein is elucidated by protein sequencing.
  • Knowledge of the sequence of a protein is essential to the techniques of molecular biology.
  • protein sequence information is a prerequisite for DNA cloning and provides information for making oligonucleotide probes and polymerase chain reaction (PCR) primers.
  • PCR polymerase chain reaction
  • protein sequencing allows the synthesis of peptides to be used in antibody production, enables the identification of proteins of interest, and helps characterize recombinant products.
  • de novo sequencing When the sequence of a peptide sample is deduced without any additional information such as the sequence of a known related peptide, the approach is known as de novo sequencing. Despite the progress in genomic DNA sequencing, de novo sequencing of proteins and peptides is still required in a biological research environment since many experiments are carried out in organisms whose genomes are not sequenced.
  • Chemical sequencing of the C-terminus of a protein can be accomplished by the thiocyanate method (Schlack & Kumpf, Physiol. Chem., (1926) 154:125-170). Although useful for sequencing proteins and peptides that are blocked at the N-terminus, this method also has its drawbacks, including the severity of the reaction conditions and the need to couple the protein to a solid support (Bailey, J. Chromatog. A, (1995), 705:47-65).
  • mass spectrometry has emerged as an attractive alternative to chemical methods and has been used to solve sequencing problems that are not easily handled by conventional techniques of protein chemistry (sSe, e'.g, Carr & Annan, "Overview of Peptide and Protein Analysis by Mass Spectrometry," in Current Protocols in Molecular Biology, Ausubel et ah, Eds., John Wiley & Sons, Inc., (1997), 10.21).
  • mass spectrometry the molecular weights of gas-phase ions that are formed from intact neutral molecules are determined by separation based on their mass-to-charge (m/z) ratios.
  • One effective way of sequencing proteins is the use of mass spectrometry to determine the molecular weights of peptides in mixtures, such as those resulting from proteolytic digestion.
  • the digestion of a protein with a particular enzyme, e.g., trypsin cleaves the protein at specific sites whose locations depend on the amino acid sequence of the protein.
  • the result is a collection of peptides that gives rise to a signature mass spectrum, often called a "fingerprint.”
  • m/z values are measured to better than 0.01% accuracy, the amino acid composition of a peptide fragment can be reliably deduced.
  • a fingerprint can be utilized to unambiguously identify a protein, or to verify a translation product by comparing it to information contained in a database of peptide fingerprints of known proteins.
  • Mass spectrometry is not limited to measuring the masses of single species but, through the technique of tandem mass spectrometry (MS/MS), can also reveal structural information, including peptide sequences.
  • MS/MS tandem mass spectrometry
  • further fragmentation of the gas phase ions occurs, either spontaneously, or by collision with gas molecules in so-called “collision induced dissociation” (CID).
  • CID collision induced dissociation
  • tandem mass spectrometry typically uses a first mass analyzer to select a particular peptide ion that it permits to undergo fragmentation, for example by CID, to produce subfragment ions of the parent peptide or peptide fragment.
  • the technique also utilizes a second mass analyzer so that, after initial peptide ionization and ion selection, subfragment ions are separated and analyzed.
  • the resulting mass spectra contain m/z ratios for the subfragments.
  • the fragmentation mechanisms undergone by organic molecules, for example during
  • the fragmentation process is not ideal. Some amide linkages are not cleaved during CID so that the differences between some peaks in the MS/MS spectrum do not correspond to masses of single amino acid residues but to two or more residues.
  • a so-called “spectrum graph” is derived from the measured spectrum by assigning a vertex to each peak and constructing an edge between pairs of vertices whose masses differ by the mass of an amino acid residue, (Dancik, et al, J. Comp. Biol, (1999), 6:327-342).
  • the correct sequence can be inferred from the longest path within the graph but only if noise is efficiently eliminated from the spectrum.
  • this method produces a large number of suggested sequences with a scoring probability associated with each, and relies upon carrying out a graph theoretical technique, the antisymmetric longest path problem, which scales " very poorly with increasing peptide length.
  • Isotopic labeling of C-terminal peptide fragments e.g., by enzymatic digestion of a protein in 1 :1 16 O/ 18 O water, provides a characteristic isotopic distribution for these fragments that can be readily identified (Schnolzer et al, Electrophoresis, (1996), 17:945-953).
  • the principle of the method is to identify C-terminal fragment ions of a peptide in one spectrum by their 1 : 1 I6 Q /18 Q jgotopic pattern when the peptide has been labeled at its C-terminus to 50% with 18 O isotopes and to 50% with 16 O isotopes before being subjected to a tandem mass spectrometric analysis.
  • peptide subfragment ions having either a 16 O atom or a 18 O atom at the C-terminus
  • peptide subfragment ions having either a 16 O atom at the C-terminus, or one 13 C atom and one 15 N atom, or two 13 C atoms, or two 15 N atoms.
  • the peptides become larger, there is a greater chance for incorporation of the less abundant 13 C and 1S N isotopes, and the problem of identifying C-terminal peaks for amino acid sequencing becomes increasingly difficult.
  • Mass spectrometry is a more promising technique for protein sequencing because it requires picomolar or even femtomolar amounts of sample and produces highly accurate spectra.
  • difficulties in spectral interpretation are significant for larger peptides and proteins. Accordingly, the present art is in need of an analytical technique that permits the sequence of large peptides to be deduced from mass spectra.
  • the present invention involves the derivation of the amino acid residue sequence of a protein or peptide through the automated analysis of differential scanning mass spectrometry data.
  • the aspect of peptide sequence analysis addressed by the present invention is the automated identification of C-terminal, or y-ion peaks, in the mass spectrometry data. Once y-ion peaks have been identified, peptide sequences can be deduced by calculating mass differences between adjacent y-ion peaks and attributing each mass difference to a specific amino acid residue. Since a mass spectrum of a peptide consists of a large number of peaks, the derivation of the peptide sequence by human inspection of a simple difference between a pair of spectra is usually not straightforward and rarely fast.
  • the subject of the present invention is a computer algorithm for deducing the peptide sequence of a peptide from a pair of MS/MS spectra obtained on a partially isotopically labelled sample.
  • the algorithm seeks to compute a "filtered" spectrum comprising just the C-terminus set of subfragments (the y-ion series), from whiclTit is possible to accurately deduce the amino acid sequence.
  • the present invention involves an apparatus for determining the amino acid residue sequence of a peptide, comprising: an input device configured to accept mass spectrometry data obtained by applying differential scanning mass spectrometry to a sample of the peptide in which an isotopic label is present in a proportion which is substantially different from its natural abundance; a processor configured to execute mathematical operations on the mass spectrometry data; and a memory connected to the processor to store: a first set of instructions to direct the processor to generate a probability that a peak in the mass spectrometry data derives from a y-ion subfragment of the peptide wherein the first set of instructions are repeatedly executed for each peak in the mass spectrometry data; a second set of instructions to direct the processor to produce a filtered mass spectrum of the peptide, wherein each peak in the filtered mass spectrum whose intensity is greater than a threshold value, is predicted to correspond to a y-ion subfragment of the peptide; and a third set of instructions to direct the processor to derive and store
  • the mass spectrometry data comprises a first mass spectrum that has signals from subfragment ions in which the isotopic label is both present and absent, and a second mass spectrum in which signals from subfragment ions in which the isotopic label is not present are substantially suppressed.
  • the probability is computed from a product of a first scoring value and a second scoring value, wherein the first scoring value is proportional to the likelihood that a peak in the first mass spectrum arises from an isotopic cluster that comprises a signal from a subfragment ion in which the isotopic label is absent and also a signal from a subfragment ion in which the isotopic label is present in the proportion; and wherein the second scoring value is proportional to the likelihood that a peak in the second mass spectrum arises from an isotopic cluster containing a peak from a subfragment ion in which the isotopic label is present in the proportion and in which a peak from a subfragment ion in which the isotopic label is absent is effectively suppressed relative to the first mass spectrum.
  • the present invention additionally involves a method for determining the amino acid residue sequence of a peptide, the method comprising: accepting mass spectrometry data obtained by applying differential scanning mass spectrometry to a sample of the peptide in which an isotopic label is present in a proportion which is substantially different from its natural abundance; generating a probability that a peak in the mass spectrometry data .
  • the method for determining the amino acid residue sequence of a peptide is executed by a computer under the control of a program, the computer including a memory for storing the program, an input device configured to accept mass spectrometry data and a processor configured to execute mathematical operations on said mass spectrometry data.
  • FIG. 1 A computer system according to the present invention.
  • Figure 2. A quadrupole time of flight mass spectrometer used in the preferred embodiment of the invention.
  • Figure 3. Flow chart of partial isotopic labelling for u ⁇ fe * with a preferre ⁇ embodiment, of the present invention.
  • Figure 4 Flow chart of a differential scanning method.
  • Figure 7 Spectra showing comparison of unfiltered and filtered peptide subfragment ion mass spectra.
  • Figure 8 Representative mass spectrometer for practicing the invention.
  • protein is used herein in a broad sense which includes, mutatis j mutandis, peptides, polypeptides and oligopeptides, and derivatives thereof, such as glycoproteins, lipoproteins, and phosphoproteins, and metalloproteins.
  • the aim of the method is to simplify and automate analysis of the MS/MS spectra in such a way that a likely peptide sequence can be proposed.
  • the method is implemented in a computer algorithm. It is based on acquiring not just one, but two, fragment-ion spectra of peptides from a protein sample which has been enzymatically digested in a water mixture comprising known proportions of H 2 18 O and H 2 16 O.
  • the water mixture is such that the fractional composition of H 2 18 O is substantially greater than its natural abundance and the conditions are such that the peptide fragments incorporate 18 O labels at their C-termini in the same proportion as is present in the water mixture.
  • One spectrum is obtained by selecting the entire 16 O/ 18 O isotopic mixture of the peptide for fragmentation and a second spectrum is obtained for which only 18 O labeled peptide ions are fragmented.
  • the data are analyzed using the computer program product and methods of the present invention in order to identify the peaks which arise from y-ions. Peaks corresponding to*C-term ⁇ al peptide subfragments can be identified when comparing the two spectra using two criteria.
  • the first criterion is their 16 O/ 18 O isotopic distribution in the first spectrum which is usually difficult or impossible to recognize unambiguously by visual inspection.
  • the second criterion is the
  • C-terminal ions are identified by having peaks from complete 16 O/ 18 O isotopic distributions in the first spectrum but only peaks from 18 O isotopes in the second spectrum.
  • Non C-terminal ions have the same isotopic representation in both spectra since they do not contain the 18 O isotope in the proportion introduced by
  • the peptide sequence can be deduced by calculating the mass difference between adjacent fragments and from their order in the spectrum.
  • the methods and computer program product of the present invention may further comprise the calculation of subtracted and filtered mass spectra.
  • the methods of the present invention may be applied to proteins or peptides of any length, provided that machine resolution permits a well-resolved mass spectrum to be obtained, in particular as long as the different isotopes can be resolved.
  • the number of amino acids which can be read is sequence dependent, so there will be peptides of say 20 amino acids in length for which only 5 amino acids can be read, whereas there may be
  • the invention comprises a system 100 for deducing a peptide sequence from mass spectrometry data obtained from mass spectrometer 130.
  • System 100 comprises a processor 102; a section of memory 104 which will typically include both high
  • an input device 106 for inputting user-specific parameters, which may comprise a keyboard, mouse and/or touch-screen display; an output device 108 for printing or displaying the sequence of the protein or peptide, and at least one bus 110 connecting the processor 102, the memory 104, the input device 106, and the output device
  • system 100 also preferably comprises a network or other communication interface for communicating with* other c ⁇ ftpufers as'welf as other devices.
  • the memory preferably stores an operating system 120 for providing basic system services, a file system 122, an analysis module 128 configured to analyze mass spectrometry data, a cache 126 and optionally a graphical user interface (GUI) 124.
  • an operating system 120 for providing basic system services
  • a file system 122 for providing basic system services
  • an analysis module 128 configured to analyze mass spectrometry data
  • cache 126 for storing mass spectrometry data
  • GUI graphical user interface
  • system 100 acquires mass spectrometry data via data channel 132 from mass spectrometer 130.
  • the mass spectrometer 130 is a triple quadrupole mass spectrometer.
  • the amino acid sequence is determined by calculating the mass differences between adjacent y- ion peaks. Each mass difference corresponds to the mass of one amino acid residue. All amino acids in a peptide chain, except for leucine and isoleucine which have the same mass as each other, may be distinguished.
  • the entire protein sequence may be determined by concatenating or overlapping separate peptide sequences determined from the spectra of different peptide fragments, using principles well known to one skilled in the art.
  • This system when operated in a laboratory environment in conjunction with mass spectrometry data can provide an efficient and useful method of deducing the amino acid residue sequence of a protein or peptide.
  • a mass spectrometer separates ions according to their m/z ratio, the ratio of their mass, m to charge, z.
  • a sample is ionized, for example by electron bombardment, creating ions that, in a subsequent stage, are accelerated through an inhomogeneous electromagnetic field towards a detector.
  • the magnetic field perturbs the trajectories of the ions according to their m/z ratio: an ion with a small mass will travel more quickly and be less easily perturbed than a heavier ion; an ion with a small charge will be perturbed more than one with a large charge.
  • a triple quadrupole mass spectrometer is used to acquire peptide subfragment data.
  • An example of such a machine is an API HI from Perkin Elmer Sciex (PE-Sciex).
  • PE-Sciex Perkin Elmer Sciex
  • three quadrupoles are used as an ion guide, the mass filter and the collision cell.
  • the typical layout of such a mass spectrometer 300 is shown in Figure 3, though it is understood that variations on the components of such a mass spectrometer are envisaged for practice with the methods of the present invention.
  • precursor ions are produced from an ionization source 304.
  • electrospray ionization is used to produce the precursor ions.
  • the precursor ions are optionally passed through a first quadrupole 306 which acts as an ion guide. This ion guide is not usually a mass-selective quadrupole and is usually only present in triple- quadrupole machines.
  • Precursor ions pass into a mass filter 310 that selects a precursor ion having a particular value of the m/z ratio, or, more generally precursor ions whose m/z ratios lie within a narrow range.
  • mass filter 310 which gives the greatest sensitivity is the quadrupole mass filter.
  • An ion trap can alternatively be used.
  • mass filter 310 is a quadrupole mass filter.
  • the range of m/z ratios transmitted by the quadrupole mass filter is known as the transmission window.
  • mass spectrometer 300 used to acquire peptide subfragment data is a quadrupole time of flight ("Q-TOF") mass spectrometer.
  • Q-TOF time of flight
  • An example of such a machine is the "Q-Tof2" by Micromass, in the United Kingdom..
  • Such a machine employs two quadrupoles.
  • a quadrupole 312 is employed as the mass filter for precursor ion selection, and a quadrupole 322 is used in a collision cell 310 where the precursor ion is further fragmented into subfragments.
  • a time of flight (“TOF”) mass analyzer 340 is used to examine the subfragment ions.
  • a representative mass spectrometer design for practicing the invention is also shown as Figure 8.
  • ionization techniques used to produce precursor ions for mass spectrometry analysis. These include, but are not limited to, electron ionization, chemical ionization, field ionization, field desorption, fast-atom bombardment, plasma desorption, laser desorption, and electrospray ionization.
  • MALDF matrix-assisted laser desorption ionization
  • ESI electrospray ionization
  • MALDI is a specific type of laser desorption in which biomolecules are
  • PSD post-source decay
  • the ionization source in use with the present invention, the ionization source
  • ESI precursor ions by ESI, according to which, ions are formed by spraying a dilute solution of biomolecules at atmospheric pressure from the tip of a fine metal capillary.
  • the spray creates a fine mist of droplets that become highly charged in a high electric field.
  • the biomolecules pick up one or more protons from the solvent to form ions with single or multiple positive charges.
  • ESI is used to generate precursor ions.
  • MALDI can result in extensive fragmentation of the sample and precursor ions, ESI results in little to no fragmentation.
  • samples for ESI are in solution so
  • the technique is ideally suited for coupling with purification techniques, such as HPLC.
  • a quadrupole mass filter 310 is used to select precursor ions.
  • a quadrupole mass filter 310 is used to select precursor ions.
  • a 30 filter comprises a quadrupole 312, consisting of two pairs of precisely parallel metal rods, with opposite rods being electrically connected.
  • a voltage made up of a direct current potential (“DC”) and an alternating radiofrequency (“RF”) component is applied to each pair of rods. Because ions passing through the quadrupole are alternately attracted to and repulsed from the rods, they have an oscillating trajectory, and only those ions with kinetic
  • the filtered precursor ions having a particular m/z are sent to a collision cell 320.
  • the collision cell comprises the third quadrupole.
  • it typically comprises the second of two quadrupoles. It is understood that many machines that are compatible with the present invention utilize collision cells that comprise quadrupoles. In machines that utilize ion traps, the ion trap itself is a collision cell because ions can be collided with rest gas atoms inside it.
  • Li collision cell 320 the filtered precursor ions collide with uncharged gas molecules, such as argon or xenon, or dinitrogen, delivered from a source 314.
  • uncharged gas molecules such as argon or xenon, or dinitrogen
  • the kinetic energy of the precursor ions is partially transformed into vibrational energy, resulting in the breaking of the precursor ions' predominantly weak chemical bonds.
  • Peptide precursor ions preferentially fragment at their peptide amide bonds to produce peptide subfragments.
  • the resulting subfragment ions are analyzed by the mass analyzer 340.
  • the mass analyzer used is a time of flight ("TOF") mass analyzer.
  • TOF time of flight
  • subfragment ions are accelerated through accelerating plates 342 and pass into a region that has no external electric field, known as a drift tube 344. If all of the subfragment ions entering the drift tube have the same kinetic energy, given by ' ⁇ mv 2 for an ion of mass m and speed v, then since velocity is inversely proportional to the square-root of mass, subfragments with larger mass 346 will travel more slowly than subfragments with smaller mass 348. The heavier subfragment ions will therefore reach the detector 350 at the end of the drift tube at a later time than the lighter subfragment ions.
  • TOF analyzers are often used in conjunction with MALDI. TOF analyzers are advantageous in that they have virtually unlimited mass range and high scan rates.
  • the detector 350 is an electron multiplier, wherein the display ofthe mass spectrum is effectively instantaneous. Detector 350 transmits mass data to computer system 100, via transmission channel 132.
  • TOF analyzers A limitation of TOF analyzers is that peaks are broadened because not all members ofthe same subfragment ion population have the same kinetic energy. Since the initial energy spread is mass dependent, peaks from heavier subfragment ions are broader. As is well known to one skilled in the art, the initial kinetic energy distribution of subfragment ions entering the drift tube can be decreased by increasing the final accelerating voltage. The resolution ofthe TOF analyzer can also be increased by increasing the length ofthe drift tube, which increases the time difference between arrivals of ions of different m/z, but also increases the spread of arrival times of ions having the same m/z.
  • the TOF analyzer is a "reflectron" type in which the ions follow a curved path.
  • a reflectron TOF analyzer slows the ions down and turns them round before directing them to the detector. When the ions turn around the slower ones catch up with the faster ones.
  • Mass spectrometry data comprises a number of elements, wherein each element has an intensity value, I, for a m/z value.
  • the data comprises elements across a range of m/z values.
  • a name ofthe unit widely used for m/z values is "Thomson" (Th).
  • the collection of data comprising intensity values for a range of Thomson is often called a "mass spectrum.”
  • the m/z values in a mass spectrum are typically separated from one another by 0.02 Th, but, depending upon resolution, may be separated from one another by 0.01 Th or 0.05 Th.
  • a "peak" in a mass spectrum is defined by a collection of adjacent elements, at which each intensity value is above a threshold intensity value.
  • Mass spectrometry data typically also comprises a background intensity, and many low-intensity pieces of data, often called noise.
  • the threshold intensity value may be chosen so that noise is eliminated from consideration during analysis.
  • a peak intensity is proportional to its height, though this approximation may break down for more complex spectra, particularly for heavier ions.
  • the overall intensity of a peak is obtained by calculating the area under the peak.
  • the calculation is achieved by a centroiding method.
  • centroiding for any peak whose width, measured as full-width at half maximum height (“FWHM"), is at least 0.04 Th, data in a window of width 0.08 Th are merged into the peak and added up. Centroiding is not generally good enough for the accuracy needed with the present invention because separate peaks may be accidentally merged.
  • an integration method is employed for calculating peak intensities. This method preferably adds all intensities that are present around a peak within a window of about ⁇ 0.02 Th.
  • a different window should be chosen according to the number of charges on the subfragment ion. Accordingly, for a singly charged fragment, the window is preferably 0.04 Th; for a doubly charged fragment, the window is preferably 0.02 Th. It is consistent with the methods ofthe present invention that other windows may be chosen when carrying out peak integration. Indeed it is also possible that different sized windows may be chosen over different regions of a mass spectrum.
  • the mass spectrum of a given peptide subfragment will comprise a number of closely separated peaks, each of which corresponds to a particular distribution of isotopes amongst its atoms. If the peptide subfragment attains a single charge during ionization, then the closely separated peaks for that subfragment are each separated by approximately one m/z unit.
  • the collection of peaks which correspond to fragments differing from one another only by isotopic variation is called a cluster. With the exception of 12 C whose mass is defined to be 12.0000 atomic mass units, no isotope has an integer mass.
  • the mass of a peptide molecule with one 13 C atom is not exactly the same as the mass ofthe same peptide molecule with one 17 O atom but no 13 C atoms. Therefore the peaks within a cluster may be poorly resolved ' and may overlap to a great extent.
  • the mass of those molecules in which every atom is present as the most abundant isotope is called the "monoisotopic mass.”
  • the monoisotopic mass of a molecule comprises a sum ofthe accurate masses for the most abundant isotopes over all the atoms.
  • the peak which corresponds to the monoisotopic mass is typically of lowest mass because the most abundant isotope of each element occurring in a protein or peptide has the lowest mass of all the isotopes. This peak is not always the most intense, however.
  • the intensity distribution ofthe peaks within a cluster is often called an "envelope" and its shape is the result of many contributing factors. For very large molecules, the peak corresponding to the monoisotopic mass is not necessarily the most intense. The most significant contributor to the isotopic peak pattern for biomolecules is 13 C.
  • the occurrences ofthe heavy isotopes of oxygen, nitrogen, and sulfur also contribute to the isotope envelope. Carbon has two principal naturally-occurring isotopes: 12 C, which has a mass of 12.000000 and a natural abundance of 98.9%; and 13 C, which has a mass of 13.003355 and a natural abundance of 1.1%.
  • the first peak in the resolved isotopic cluster arises from the all 12 C-containing ion.
  • the first peak in the isotopic cluster will not be the most intense peak because the all 12 C-containing ion will no longer be the most abundant, i.e., on average every molecule in the sample will contain at least one atom of 13 C. In such cases, it may be more useful to 5 consider the most intense peak and refer to it as the "average mass.”
  • a feature ofthe present invention is the comparison of two mass spectra obtained for the same sample, the two spectra differing from one another by the centering ofthe
  • the subtraction of one peak from another may not give a baseline value, thus giving rise to small positive or negative peaks.
  • the spectra can be scaled so as to match peak heights to one another, the scaling factor required may vary over the range ofthe spectrum.
  • the spectrum ofthe 18 O containing peptide subfragment is scaled to overlap with the 16 O/ 18 O spectrum. The spectra are divided into
  • Partial Isotopic Labeling of Peptides are for use in conjunction with partial isotopic labeling of peptide fragments of a protein and the differential scanning mass spectrometry technique. Partial isotopic labeling ofthe C-termini of peptide fragments can be accomplished by methods known to those of skill in the art. A preferred embodiment for use with the present invention is shown in Figure 3. Peptides are labeled by enzymatic digestion of a protein 200 using, ter alia, trypsin, chymotrypsin, or papain, preferably trypsin, in bulk solvent water, a known proportion of which is 18 O-labeled water, i.e., H 2 18 O, step 202.
  • the known proportion of labelled water is substantially different from the proportion ofthe label found naturally.
  • substantially different means present in an amount that renders contribution from the natural abundance ofthe isotope insignificant when carrying out mass spectrometry measurements and means present in an amount that facilitates automated analysis of a mass spectrum so that signals from peptides that have incorporated label from the labeled water are readily distinguished.
  • the protein is digested in the presence of 30% by volume 18 O-labeled water, preferably in the presence of 33% by volume 18 O-labeled water, more preferably in the presence of 40% by volume 18 O-labeled water, or most preferably 50% by volume 18 O- labeled water.
  • any known proportion between about 30% and about 75% by volume of 18 O-labeled water is suitable for carrying out the methods ofthe present invention.
  • Proportions by volume of about 30% to about 75% 18 O-labeled water are substantially different from the natural abundance of 18 O-labeled water.
  • peptide fragments are purified and separated, step 204, by, e.g., gel electrophoresis or HPLC.
  • the peptide fragments 206 that are produced are analysed by mass spectrometry. Accordingly, hereinafter the term peptide will also include the term peptide fragment, as understood to be a peptide that has been produced by fragmentation of some longer peptide.
  • the enzyme When the enzyme digests the protein, it cleaves a peptide amide bond leaving at least one peptide fragment with a free amino group (N-terminus) and a corresponding peptide fragment with a trailing carbonyl group (C-terminus).
  • a water molecule from the bulk solvent water adds to the C-terminus group to produce a»carboxylie acid group.
  • Due to the presence of a known proportion of 18 O-labeled water a known proportion ofthe cleaved peptide fragments will have 18 O at the C-terminus.
  • the proportion of cleaved peptide fragments with 18 O at the C-terminus is preferably substantially the same as the proportion of 18 O-labeled water by volume in the bulk solvent water.
  • the known proportion of 18 O-labeled water is 50% by volume.
  • the peptide fragment and each subfragment ofthe peptide fragment that includes the C-terminus will have the characteristic 1:1 16 O / 18 O isotopic distribution that should be distinguishable in a mass spectrum as two peaks of similar intensity separated by two mass units.
  • the labeling is preferably on the N-terminus.
  • Isotopic labeling at the N- terminus is not as straightforward as isotopic labeling at the C-terminus, which is readily accomplished at the same time as enzymatic digestion.
  • An 15 N b&se ⁇ " labeling scheme is not t ideal because there are very few practical reactions which could introduce such an isotopic label into the peptide.
  • labeling at the N-terminus is preferably accomplished artificially, for example, by acetylation.
  • Carrying out the acetylation reaction with a mixture of reagents, one ordinary, the other containing a heavier isotope could introduce a mixture of isotopes at the N-terminus.
  • the methods ofthe present invention are not limited to sequence determination of peptide fragments obtained by enzymatic digestion of a protein.
  • sequence of any peptide that has been subjected to partial isotopic labelling may be determined by the method of the present invention.
  • two MS/MS spectra are obtained for a given peptide fragment 400.
  • a first spectrum, denoted SP1 is obtained, step 402, for the mixture of 16 O and 18 O containing peptide and their respective subfragments.
  • a second spectrum, denoted SP2, is obtained, step 406, for just the 18 O containing peptide and its subfragments.
  • signals for the 16 O containing peptide and its subfragment ions are substantially suppressed.
  • these two spectra are collected on the same peptide sample.
  • the first and second spectra may be obtained in any order but are separated by a step of re-centering the transmission window, step 404.
  • step 408 ofthe two spectra can produce a substantially clean spectrum for the C-terminus series of subfragments ofthe 16 O containing peptide. Peaks arising from non C-terminal subfragments always have their normal isotopic distribution (irrespective of 18 O labelling through enzymatic digestion) and therefore should not remain when the two spectra are subtracted from one another*
  • the peptide sequence 410 can be obtained from the analysis.
  • a peptide sample usually contains many different species, for example the different peptide fragments which result from enzymatic digestion, or the different isotopically substituted forms of a particular peptide or peptide fragment.
  • the different isotopically substituted forms of a particular peptide or peptide fragment constitute the sample that is introduced into the mass spectrometer.
  • the selection of a precursor ion or precursor ions by appropriate adjustment ofthe transmission window therefore permits analysis of a particular species or a restricted subset of all of the species.
  • the performance of precursor ion selection from a quadrupole mass filter entails a compromise between resolution and sensitivity. The resolution is determined by the width ofthe transmission window.
  • the highest resolution is obtained from the narrowest window, the highest resolution also requires the highest sensitivity. Therefore, operating the quadrupole mass filter so that it selects a single isotope results in insufficient transmission of precursor ions to permit accurate analysis. That is, at the highest resolution possible, not , enough sample is transmitted to give a useful spectrum at the sensitivity levels employed.
  • the transmission window is not uniform, however. That is, ions whose m/z ratios lie within the transmission window are not transmitted with equal intensities. The way in which the intensity varies across the transmission window of a mass filter is called the transmission curve.
  • Differential scanning mass spectrometry is based in part on Applicants' surprising discovery that, because ofthe shape ofthe transmission curve of a quadrupole mass filter, the transmission window can be chosen in such a way that ions may effectively be excluded without a concomitant loss of sensitivity.
  • the shape of the transmission curve of a quadrupole mass filter is not symmetric around the selected m/z, but has a sharply rising flank (toward the lower m/z) and an extended, longer tail (toward the higher m/z).
  • the quadrupole mass filter is such that a transmission window corresponding to, e.g., 3 Da, can be chosen. If it is centered at an m/z value corresponding to the mono- isotopic mass, it transmits both the 16 O- and 18 O-containing ions of a particular peptide, giving a first spectrum, SP1.
  • the transmission window is th ⁇ n re*-ce TeTred * arouriU a second position, at a m/z value corresponding to one mass unit higher, without changing its width and thereby without reducing the signal-to-noise ratio, in order to obtain a second spectrum, SP2.
  • the transmission window In its second position, the transmission window effectively prevents transmission of the ,6 O-containing ion without affecting transmission ofthe 18 O-containing ion. Therefore, transmission ofthe peptide containing the lower molecular weight oxygen isotope at its C- terminus is essentially completely suppressed in the second spectrum, SP2.
  • the second position ofthe transmission window permits transmission of ions whose masses are two mass units higher than the monoisotopic mass. Although such species include normal isotopic variants ofthe 16 O-containing species (e.g., those ions containing two 13 C atoms), their contribution is out-weighed by the contribution from the peptide ions which have picked up an unnatural proportion of 18 O through enzymatic digestion.
  • the transmission window can be centered at the second position prior to the first position.
  • the selected precursor ions are subsequently passed into a collision cell 320 wherein the precursor ions are fragmented into "subfragments.”
  • Subfragments are also identified herein as “peptide subfragments,” or “subfragment ions.”
  • subfragment ions that are produced from a precursor ion are passed into a mass analyzer 340 and thereafter to a detector 350.
  • it is usually preferable to calibrate the mass analyzer In order to accurately assign masses from m/z values in the spectrum, it is usually preferable to calibrate the mass analyzer. As is well known to one skilled in the art, calibration can take the form of recording a spectrum for a sample whose mass is known accurately.
  • a transmission window of 3 Da is not so narrow that unacceptable loss of sensitivity occurs in the resulting spectrum. Therefore, a given fragment of a C-terminal peptide digested in 50% H 2 18 O and 50% H 2 16 O, whose 16 O-containing form has mass m, will give rise to two peaks of approximately equal intensity in the first spectrum, and only one peak in the second spectrum.
  • the two peaks in the first spectrum SP1 correspond to fragments with masses at m and m+2 whereas the single peak in the second spectrum SP2 corresponds to fragment ions with masses m+2.
  • Mass resolution is often expressed as the ratio m/ ⁇ m, where m and m+ ⁇ m are the masses of two adjacent peaks of approximately equal intensity to be resolved in the mass spectrum.
  • the differential scanning technique requires the mass analyzer 340 and detector 350 to be able to resolve signals for subfragment ions whose molecular masses differ by at most about one or two Daltons. Specifically, the peak arising from a peptide subfragment with mass m, having a 16 O atom at the C-terminus, and the peak ansmg*rrom me sa e ' peptide subfragment having mass m+2 because of a 18 O atom at the C-terminus, and both of which having the same charge, must be resolvable in the spectrum. The larger the peptide, the larger the mass ofthe subfragment ions. Therefore, the resolution ofthe analyzer must be greater for larger peptides, if the m and m+2 peaks are to be resolvable.
  • Precursor ions created by electrospray ionization often have multiple charges and consequently their m z values are fractions of their masses. Although a doubly-charged subfragment ion will appear at m/z values one half of its mass, it will be necessary to resolve peaks for subfragment ions of mass m and m+2 which are separated by a single m/z unit.
  • the resolution ofthe instrument used to collect data will influence accurate identification of C-terminal ions.
  • the methods ofthe present invention can be practiced on low-resolution machines, such as triple quadrupoles, they are preferably carried out on high resolution machines. If all of the C-terminal peptide subfragment ions can be identified by the characteristic appearance ofthe m and m+2 doublet in the 16 O/ 18 O spectrum and by corresponding suppression ofthe 16 O peak in the 18 O spectrum, then the sequence 216 ofthe peptide or protein can, in principle, be "read” from the spectrum by looking at the m/z differences between successive peaks in the C-terminal series. All amino acids except for leucine and isoleucine, which have the same mass as one another, are distinguishable from each other by their characteristic masses and hence m/z values.
  • peptide sequencing using differential scanning mass spectrometry is more difficult than simple comparison of spectra. Identification ofthe peaks arising from C-terminal peptide subfragment ions usually cannot be accomplished by visual inspection, particularly for longer peptides.
  • the computer program product and methods of the invention alleviate this difficulty and allow for fast and accurate interpretation of mass spectra acquired using the differential scanning technique, resulting in fast and accurate determination ofthe previously-unknown amino acid sequence of a protein.
  • the main problem addressed by the algorithms ofthe present invention is the identification of y-ions in the mass spectrum of a peptide.
  • the overall principle is to compute a filtered spectrum, SS, for the peptide, see Figure 5.
  • the filtered spectrum is effectively a simulated spectrum which contains a peak at a m/z value of m P , if m P corresponds to a y-ion of a 16 O containing peptide.
  • the height of a peak in the filtered spectrum is analogous to an intensity in a measured spectrum? but-is calculated by a cumulative multiplication of factors, each of which indicates the likelihood that the peak corresponds to a y-ion.
  • An advantage of a filtered spectrum is that it is also visually pleasing and easy to interpret.
  • the steps that precede production of a filtered spectrum SS are as follows, with reference to Figure 5.
  • the charge on the peptide that gives rise to the spectrum is preferably ascertained.
  • the starting points are the 16 O/ 18 O mass spectrum SP1 500 and the 18 O spectrum SP2 502, from which, the charges on the subfragment ions are deduced, step 504.
  • the peak for each subfragment in the 16 O/ 18 O mass spectrum is analyzed to see whether it corresponds to an 18 O-labeled ion, step 506, and a scoring value SI 508 for each peak is deduced.
  • Peaks in the 18 O mass spectrum are also analyzed to see whether they represent 16 O containing peptide subfragments whose presence is suppressed in the 18 O mass spectrum relative to the 16 O/ 18 O mass spectrum, to produce a scoring value S2, 512. It is to be understood that steps 506 and 510 may be reversed in order without departing from the scope ofthe present invention. Finally, scoring values SI and S2 are combined to produce a ⁇ filtered spectrum SS, 514.
  • the algorithm utilizes data for the 16 O/ 18 O spectrum, SP1, and the 18 O only spectrum, SP2.
  • the principal task ofthe algorithm is to produce a scoring dataset, SD, in which every peak in the 16 O/ 18 O spectrum is assigned a probability value that it is a y-ion of a 16 O containing peptide subfragment.
  • the filtered spectrum, SS is then computed, for every value m P , according to equation (1):
  • the final result ofthe algorithm is to produce a filtered spectrum which contains computed m/z values for 16 O y-ions, with all other ions screened out. It is to be understood that the methods ofthe present invention are equally applicable to calculations of filtered spectra that correspond to just parts, or ranges, ofthe measured spectra. It is not to be construed that the methods ofthe present invention are limited to calculations of filtered spectra, scoring dataset or scoring values that encompass the entirety of measured spectra for either ofthe positions ofthe transmission window.
  • y-ions In conjunction with differential scanning mass spectrometry, the identification of y- ions using the computer program product and methods ofthe present invention is facilitated by recognizing two essential features of y-ions in the spectra.
  • y-ions have a 16 O/ 18 O isotopic distribution in the 16 O/ 18 O spectrum, SP1.
  • the 16 O peaks of y-ions are suppressed in the 18 O spectrum, SP2.
  • the first step is to deduce a mass value, m, ofthe fragment which gives rise to the peak at position m P .
  • Methods for accomplishing this can be found in: Uttenweiler- Joseph, S., Neubauer, G., Christoforidis, S., Zerial, M., and Wilm, M., "Automated de novo sequencing of proteins using the differential scanning technique," Proteomics, l(5):668-682, (2001), incorporated herein by reference.
  • the subfragment ion giving rise to the peak at m P may be multiply charged.
  • the electrospray ionization method typically give rise to multiply charged ions.
  • the overall scoring value, SD(m P ), for a peak at m P which measures the overall probability that the peak is the first peak of a doublet arising from a partially-labeled peptide subfragment, is computed from a product of two factors, equation 2:
  • Sl(m P ) is a first scoring value that is a probability calculated by comparing the distribution and intensities of peaks in the envelope around the peak at m P in the 16 O/ 18 O spectrum, with the expected distribution and intensities of peaks for a peptide ofthe same mass using natural isotopic abundances. Therefore Sl(m P ) indicates how likely the peak at m P arises from a fragment with the 16 O/ 18 O ratio resulting from enzymatic digestion in 50% H 2 18 O and 50% H 2 16 O, or, in an alternative embodiment, in a water mixture containing some other proportion of H 2 18 O.
  • S2(m P ) is a second scoring value that is a probability calculated by comparing the intensity ofthe peak at m P in the 16 O/ 18 O spectrum SP1 with the intensity ofthe peak at m P in the 18 O spectrum SP2 and evaluating the degree of suppression of this peak in the second spectrum. Therefore S2(m P ) indicates how likely the peak at m P corresponds to the 16 O containing y-ion of a peptide. Calculation of a first scoring value SI based on expected and>observed isotopic distributions.
  • the first step in the method ofthe present invention calculates a first probability, known as a first scoring value, SI, that a particular peak at position m P arises from the first isotope of a 16 O/ 18 O isotopic cluster in spectrum SPl .
  • SI a first scoring value
  • the observed isotope envelope comprises contributions from ions whose masses are approximately m 0 +l, m 0 +2, nin+3, etc. If the monoisotopic species gives rise to a peak at m P with intensity I n the envelope will comprise successive peaks, denoted (m P +l) with intensity I l5 (m P +2) with intensity I 2 , (m P +3) with intensity I 3 , and so forth.
  • the ions contributing to the cluster have a single charge, the successive peaks in the envelope are separated by approximately one m/z unit.
  • the highest mass that is usually to be considered depends on the value of m,,, since larger peptides are expected to incorporate a greater number of heavy isotopes, and will therefore have more significant peaks in the isotope envelope.
  • the observed peak intensities ofthe isotope envelope, In, Ij, I 2 and so forth, are usually governed by the natural isotopic abundance.
  • the natural isotopic distribution of carbon, nitrogen, oxygen and sulfur (Table 1) has been a factor that has complicated the interpretation of peptide mass spectra, but in the present invention it can be used to some advantage.
  • Bec'ause natural abundances are known it is straightforward to identify when : they are perturbed, for example, by the artificial 16 O/ 18 O ratio arising from enzymatic digestion in a mixture of 50% H 2 18 O and 50% H 2 16 O, and to quantify the extent to which they are perturbed.
  • a fragment of mass M pep has a fragment of mass M pep + n in its isotopic cluster.
  • the intensity, ⁇ , of fragment M pep + n in the envelope can be calculated by addition ofthe term ⁇ n .
  • the initial values for all the L s are 0 with the exception of I, which is never calculated because it is set to 1.
  • I Q is the intensity ofthe monoisotopic species of
  • L is the intensity ofthe n'th isotope after the first (the n+1 ' th isotope altogether).
  • the formulae are approximations derived from average abundances amongst currently known peptide sequences.
  • the most up to date compilations of peptide sequences that are suitable for deriving these formulae include, for example, a "non-redundant" database, updated on a regular basis by the European Bioinformatics Institute (EBI). See for example http://www.ebi.ac.uk/ also available at ftp://ftp.embl-heidelberg.de/pub/databases/nrdb.
  • NCB1 Another similar database compiled by NCB1, can be found at http ://www.ncbi.nlm.nih. gov/.
  • the peak at m P is the first peak in a y-ion 16 O/ 18 O isotope cluster
  • the observed intensities of the peaks (m P +2), (m P +3) and so forth will be different from those of a subfragment whose oxygen content is that naturally occurring.
  • the characteristic doublet ofthe 16 O/ 18 O isotopic cluster and the isotope envelope ofthe 18 O-containing ion will be superimposed onto the isotope envelope ofthe 16 O-containing ion.
  • the isotopic envelope of a non-y-ion will simply follow the expected naturally occurring form.
  • the theoretically expected peak intensities, denoted L *, I 2 *, I 3 *, and so forth, based on natural abundance of isotopes, for a 16 O-containing y-ion can be calculated using a polynomial expression ofthe type shown above.
  • the observed and calculated intensities are normalized to I Q and I n * respectively to permit quantitative comparison.
  • the scoring value SI is a function ofthe difference between the observed and calculated-intensities jbr each peak in the isotopic envelope, as shown in equation 3:
  • the absolute value ofthe difference in intensities, ⁇ district, is calculated from 1 ⁇ , the observed intensity for peak (m P +n) and I,*, the intensity calculated for a peak (m P +n) assuming that 0 the peak at m P arises from a 16 O-containing y-ion.
  • Sl n (m P ) in equation (3) is the contribution to the scoring value SI ofthe peak at m P from the intensity ofthe peak at (m P +n).
  • Equation (3) There are two parameters in equation (3) which have the following effects: ⁇ is a "strength", i.e., a weight given to the scoring value, adjustable according to how significant 5 this criterion is to be; ⁇ is a "sharpness” parameter affecting how quickly Sl n drops to zero 1. ⁇ with increasing ⁇ -,.
  • the sharpness parameter, ⁇ determines how fast the scoring values drop to 0.00 l* ⁇ . It is preferably not fixed, but calculated from the data itself.
  • the purpose ofthe scoring 0 function is to multiply peaks which have an 18 O isotope with the scoring strength ⁇ and peaks which do not have this isotope with a very small value, 0.001 * ⁇ , according to a preferred form of equation (3). Since most ofthe peaks will not have an 18 O isotope (only the C-terminal fragments have it), the average peak should be multiplied with a very small value close to 0.001 * ⁇ . Thus, ⁇ is preferably chosen such that the average peak is multiplied with about 0.003 * ⁇ . This means mathematically:
  • — ( ⁇ (5 > l ⁇ avg) 0 with ⁇ avg the average of all values determined from this spectrum.
  • is fixed at the value 10.0.
  • Values of ⁇ and ⁇ are sensitive to the machine employed and the quality ofthe data. It is within the capability of one skilled in the art to choose values of ⁇ and ⁇ different from 5 those given here in order to produce better results, according to tKe sanapleand the maabine employed.
  • the first scoring value, Sl(m P ) is the measure of similarity between observed intensities of peaks in the isotopic envelope around a given peak at m P and the intensities calculated for these peaks assuming that the peak at m P is the first isotope in a 16 O/ 18 O isotopic cluster in SPl.
  • the scoring value takes into account not only the degree of y-ion labeling, but also the natural abundance of isotopes, which have traditionally complicated the mass spectra of large peptides. A small difference between observed and calculated intensities is reflected in a high scoring value, which indicates a high probability that the peak at m P is due to a 16 O-containing y-ion, i.e., the monoisotopic species.
  • SI -values are preferably normalized to 1, by dividing through the entire SI function by its maximum value after it has been calculated for every peak m P .
  • This step effectively converts the scoring values into probabilities. Calculation of a second scoring value based on the degree df suppression of apeak in the second spectrum.
  • the second procedure in the method ofthe present invention is to compute a second probability, known as a second scoring value, S2, that a particular peak at m P arises from the first isotope of a 16 O/ 18 O isotopic cluster whose 16 O isotopes are suppressed in spectrum SP2. This calculation is achieved by comparing the two spectra, SPl and SP2, thereby determining the amount of suppression ofthe peak in SP2.
  • the transmission window of a quadrupole mass filter can be re-centered to a higher m/z value without being narrowed, so that transmission of a lighter isotope is effectively precluded.
  • Use of a constant transmission window width ensures constant sensitivity.
  • the two different spectra, SPl from an isotopic mixture of a particular peptide, and SP2 from only the heavy-isotope-containing peptide have similar signal-to-noise ratios.
  • the peak at m P has intensity denoted by K Q
  • the peak (m P +n) in the same envelope has intensity denoted by K-,.
  • the intensities ofthe peaks in SPl arising from the peak at m P are normalized to I n
  • the intensities ofthe peaks in SP2 arising from the peak at m P are normalized to K Q . If the peak at m P arises from the first isotope in the 16 O/ 18 O isotopic cluster, Ko «Io because this peak is suppressed in the second spectrum.
  • the second scoring value it is desirable to average abundant isotopes of a fragment ion which could be 18 O labeled.
  • a fragment ion which could be 18 O labeled.
  • the m P and (m P +2) peaks are considered: if unlabeled, only the first isotope, at m 0 , is abundant; if labeled, both the isotopes nin and m 0 +2 are abundant.
  • nin that is greater than 1,400 Da the m P , (m P +l), (m P +2), and (m P +3) peaks are considered: if unlabeled, only the first isotope is abundant; if labeled with 18 O, all ofthe isotopes nin to m 0 +4 are abundant. This is because, as mentioned above, for heavier peptide ions, the contribution of subfragments containing multiple isotopic substituents increases.
  • 1,400 Da is not fixed and other values in the region of about 1,400 Da can be chosen without departing from the spirit ofthe present invention.
  • the average relative isotopic intensity is calculated by taking the average ofthe intensities of all ofthe peaks considered.
  • the averages are I(ave) and K(ave) for spectra SPl and SP2, respectively.
  • ⁇ 2 is a scoring weight given to S2(m P ) and ⁇ n is the difference in peak intensities between the two spectra, i.e., the peak suppression.
  • the scoring weight parameter is given a value of 5.
  • ⁇ values are on the x-axis
  • S2 values are on the y-axis.
  • ⁇ 2 is a sharpness parameter since it determines how fast the scoring values drop to 0 if there is no suppression ( i.e., ⁇ n is small).
  • a high value of S2(m P ) indicates a high probability that the peak arises from a y-ion.
  • the scoring values are divided through by its maximum value.
  • a filtered spectrum SS maybe calculated using equation 8, obtained by substituting equation (2) into equation (1), to calculate an intensity for a peak at each value ofm P :
  • SS(m P ) SPl(m P )* Sl(m P )*S2(m P ) (8)
  • the procedure described is preferably repeated for every peak in both spectra, or according to choice for as many peaks as are of interest.
  • the scoring functions for every peak depend on peak specific parameters (such as suppression and deviation from the expected isotopic distribution for an 18 O-labelled peak) and on parameters which can only be calculated if all suppressions and all deviations are known (i.e., giving rise to the averaged values ⁇ avg ). Therefore, in a preferred embodiment, the calculation ofthe filtered spectrum starts by ensuring that all deviations and suppressions are calculated for every peak. Subsequently, the scoring values for all peaks are calculated and then the spectrum is multiplied with the two scoring functions. It is therefore preferred that no peak cluster is skipped. All calculations are done for all peaks always evaluating every peak for its characteristic whether this one could be the first of an 16 O/ 18 O cluster.
  • each y-ion in this series contains the C-terminus ofthe peptide.
  • each y-ion corresponds to a peptide subfragment containing an exact number of amino acid residues. Accordingly, if every peptide amide bond is cleaved in the collision chamber, each y-ion in the series differs from the nearest y-ion in mass by the mass of an amino acid residue.
  • This procedure is repeated for each adjacent pair of y-ion peaks in the mass spectrum.
  • the mass difference is compared to the sums ofthe masses of all pairs of amino acid residues to search for a match. If a match is found with a pair of amino acid masses, the two amino acid residues are placed in the sequence. The peptide amide bond between this pair of amino acids has not cleaved easily enough in the collision chamber to generate a separate subfragment containing each ofthe pair of residues.
  • the procedure of matching mass differences between adjacent y-ion peaks is repeated for each distinct peptide or peptide fragment produced by enzymatic digestion ofthe protein.
  • the sequence of each peptide or peptide fragment is deduced and the sequence ofthe protein inferred by joining or overlapping the sequences of each fragment, according to methods well known to one skilled in the art (See, for example, Mann, M., "A shortcut to interesting human genes:peptide sequence tags, expressed-sequence tags and computers," Trends in Biological Science, (1996), 21:494-495).
  • the present invention can be implemented as a computer program product that includes a computer program mechanism embedded in a computer readable storage medium.
  • the computer program product could contain a number of separate program modules that may be stored on a CD-ROM, magnetic disk storage product, or any other computer readable data or program storage product.
  • the software modules in the computer program product may also be distributed electronically, via the Internet or otherwise, by transmission of a computer data signal (in which the software modules are embedded) on a carrier wave.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Hematology (AREA)
  • Analytical Chemistry (AREA)
  • Urology & Nephrology (AREA)
  • Biomedical Technology (AREA)
  • Immunology (AREA)
  • Cell Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Microbiology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

L'invention concerne un procédé permettant la déduction de la séquence d'une protéine à partir de l'analyse des données d'une spectrométrie de masse en tandem. Ce procédé consiste à appliquer un marquage isotopique partiel est appliqué à la protéine par digestion enzymatique dans un mélange aqueux présentant une teneur non naturelle en H218O. Les fragments peptidiques résultant de cette digestion sont soumis à une spectrométrie de masse à balayage différentiel. Une analyse des pics du spectre est ensuite effectuée afin de déterminer s'ils proviennent des fragments marqués par un isotope ou non. Un spectre filtré qui ne comprend que les pics provenant des ions y est alors calculé. La séquence du peptide est déduite à partir du calcul de la différence de masse entre les pics des ions y adjacents.
PCT/EP2001/014041 2001-11-30 2001-11-30 Systeme et procede de sequencage automatique de proteines par spectrometrie de masse Ceased WO2003046577A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
AU2002218321A AU2002218321A1 (en) 2001-11-30 2001-11-30 A system and method for automatic protein sequencing by mass spectrometry
CA002468689A CA2468689A1 (fr) 2001-11-30 2001-11-30 Systeme et procede de sequencage automatique de proteines par spectrometrie de masse
PCT/EP2001/014041 WO2003046577A1 (fr) 2001-11-30 2001-11-30 Systeme et procede de sequencage automatique de proteines par spectrometrie de masse
JP2003547965A JP2005510732A (ja) 2001-11-30 2001-11-30 質量分析法によって自動的にタンパク質の配列決定を行うシステムおよび方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2001/014041 WO2003046577A1 (fr) 2001-11-30 2001-11-30 Systeme et procede de sequencage automatique de proteines par spectrometrie de masse

Publications (1)

Publication Number Publication Date
WO2003046577A1 true WO2003046577A1 (fr) 2003-06-05

Family

ID=8164706

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2001/014041 Ceased WO2003046577A1 (fr) 2001-11-30 2001-11-30 Systeme et procede de sequencage automatique de proteines par spectrometrie de masse

Country Status (4)

Country Link
JP (1) JP2005510732A (fr)
AU (1) AU2002218321A1 (fr)
CA (1) CA2468689A1 (fr)
WO (1) WO2003046577A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006284305A (ja) * 2005-03-31 2006-10-19 Nec Corp 分析方法および分析システム
DE102005028944A1 (de) * 2005-06-22 2007-01-04 Christine Bettendorf Verfahren zur Identifizierung chemischer Strukturen basierend auf Differential-Massenspektren
US8067729B2 (en) 2005-05-13 2011-11-29 Shimadzu Corporation Mass analysis data analyzing apparatus and program thereof
WO2018073404A1 (fr) * 2016-10-20 2018-04-26 Vito Nv Détermination de la masse monoisotopique des macromolécules par spectrométrie de masse
CN111307921A (zh) * 2019-11-26 2020-06-19 中国工程物理研究院材料研究所 一种绝对量测量的四极质谱氢同位素气体丰度分析方法及装置
WO2025156912A1 (fr) * 2024-01-22 2025-07-31 中国人民解放军军事科学院军事医学研究院 Procédé de séquençage de novo à précision de lecture directe pour les protéines et le protéome, basé sur l'identification de type d'ion produit et la couverture complète d'ion produit

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7230235B2 (en) * 2005-05-05 2007-06-12 Palo Alto Research Center Incorporated Automatic detection of quality spectra
WO2007076606A1 (fr) * 2006-01-05 2007-07-12 Mds Analytical Technologies, A Business Unit Of Mds Inc., Doing Business Through Its Sciex Division Acquisition dépendante de l'information déclenchée par défaut de masse
JP4841414B2 (ja) * 2006-12-08 2011-12-21 株式会社島津製作所 質量分析を用いたアミノ酸配列解析方法、アミノ酸配列解析装置、アミノ酸配列解析用プログラム、及びアミノ酸配列解析用プログラムを記録した記録媒体
ES2609669T3 (es) * 2010-02-18 2017-04-21 F. Hoffmann-La Roche Ag Método para la determinación de variantes de secuencia de polipéptidos
JP5673348B2 (ja) * 2011-05-25 2015-02-18 株式会社島津製作所 質量分析データ解析方法及び解析装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999062930A2 (fr) * 1998-06-03 1999-12-09 Millennium Pharmaceuticals, Inc. Sequençage de proteines au moyen de la spectroscopie de masse en tandem

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999062930A2 (fr) * 1998-06-03 1999-12-09 Millennium Pharmaceuticals, Inc. Sequençage de proteines au moyen de la spectroscopie de masse en tandem

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FERNANDEZ-DE-COSSIO JORGE ET AL: "Automated interpretation of high-energy collision-induced dissociation spectra of singly protonated peptides by 'SeqMS', a software aid for De Novo sequencing by tandem mass spectrometry.", RAPID COMMUNICATIONS IN MASS SPECTROMETRY, vol. 12, no. 23, 1998, pages 1867 - 1878, XP009004875, ISSN: 0951-4198 *
SHEVCHENKO A ET AL: "Rapid 'de Novo' Peptide Sequencing by a Combination of Nanoelectrospray, Isotopic Labeling and a Quadrupole/Time-of-Flight Mass Spectrometer", RAPID COMMUNICATIONS IN MASS SPECTROMETRY, HEYDEN, LONDON, GB, vol. 11, 1997, pages 1015 - 1024, XP002101143, ISSN: 0951-4198 *
TAKAO TOSHIFUMI ET AL: "Automatic precursor-ion switching in a four-sector tandem mass spectrometer and its application to acquisition of the MS/MS product ion derived from a partially oxygen-18 labeled peptide for their facile assignments.", ANALYTICAL CHEMISTRY, vol. 65, no. 17, 1993, pages 2394 - 2399, XP002229100, ISSN: 0003-2700 *
UTTENWEILER-JOSEPH SANDRINE ET AL: "Automated de novo sequencing of proteins using the differential scanning technique.", PROTEOMICS, vol. 1, no. 5, May 2001 (2001-05-01), pages 668 - 682, XP002229098, ISSN: 1615-9853 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006284305A (ja) * 2005-03-31 2006-10-19 Nec Corp 分析方法および分析システム
US8067729B2 (en) 2005-05-13 2011-11-29 Shimadzu Corporation Mass analysis data analyzing apparatus and program thereof
DE102005028944A1 (de) * 2005-06-22 2007-01-04 Christine Bettendorf Verfahren zur Identifizierung chemischer Strukturen basierend auf Differential-Massenspektren
WO2018073404A1 (fr) * 2016-10-20 2018-04-26 Vito Nv Détermination de la masse monoisotopique des macromolécules par spectrométrie de masse
US11378581B2 (en) 2016-10-20 2022-07-05 Vito Nv Monoisotopic mass determination of macromolecules via mass spectrometry
CN111307921A (zh) * 2019-11-26 2020-06-19 中国工程物理研究院材料研究所 一种绝对量测量的四极质谱氢同位素气体丰度分析方法及装置
CN111307921B (zh) * 2019-11-26 2022-11-25 中国工程物理研究院材料研究所 一种绝对量测量的四极质谱氢同位素气体丰度分析方法及装置
WO2025156912A1 (fr) * 2024-01-22 2025-07-31 中国人民解放军军事科学院军事医学研究院 Procédé de séquençage de novo à précision de lecture directe pour les protéines et le protéome, basé sur l'identification de type d'ion produit et la couverture complète d'ion produit

Also Published As

Publication number Publication date
JP2005510732A (ja) 2005-04-21
AU2002218321A1 (en) 2003-06-10
CA2468689A1 (fr) 2003-06-05

Similar Documents

Publication Publication Date Title
Zhang et al. Overview of peptide and protein analysis by mass spectrometry
Scigelova et al. Orbitrap mass analyzer–overview and applications in proteomics
Jonsson Mass spectrometry for protein and peptide characterisation
EP1756852B1 (fr) Procede et appareil pour identifier des proteines dans des melanges
US7781729B2 (en) Analyzing mass spectral data
US8278115B2 (en) Methods for processing tandem mass spectral data for protein sequence analysis
EP1889079B1 (fr) Procedes de mise en correspondance de temps de retention
EP1886135B1 (fr) Génération et utilisation d'un catalogue d'informations associées à des polypeptides pour des analyses chimiques
EP3775928B1 (fr) Procédé d'analyse pour glycoprotéines
CN107077592A (zh) 高分辨率气相色谱‑质谱数据与单位分辨率参考数据库的改进谱图匹配的高质量精确度滤波
JP5003274B2 (ja) 質量分析システムおよび質量分析方法
JP5065543B1 (ja) 翻訳後修飾ペプチドの検出および配列決定方法
Carr et al. Overview of peptide and protein analysis by mass spectrometry
Cao et al. Capillary electrophoresis/electrospray ionization high mass accuracy time‐of‐flight mass spectrometry for protein identification using peptide mapping
WO2003046577A1 (fr) Systeme et procede de sequencage automatique de proteines par spectrometrie de masse
WO2003009332A1 (fr) Procede d'analyse quantitative du phosphore
Matthiesen et al. Introduction to mass spectrometry-based proteomics
JP4959712B2 (ja) 質量分析計
Matthiesen et al. Introduction to proteomics
JP4959713B2 (ja) 質量分析計
Hossain Selected reaction monitoring mass spectrometry (SRM-MS) in proteomics: a comprehensive view
Sandoval Matrix‐assisted laser desorption/ionization time‐of‐flight mass analysis of peptides
Patel et al. Mass spectrometry-A review
JP4801455B2 (ja) 質量分析システム
Paital Mass spectrophotometry: An advanced technique in biomedical sciences

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2468689

Country of ref document: CA

Ref document number: 2003547965

Country of ref document: JP

122 Ep: pct application non-entry in european phase