[go: up one dir, main page]

WO2004006159A1 - Method and system for picking peaks for mass spectra - Google Patents

Method and system for picking peaks for mass spectra Download PDF

Info

Publication number
WO2004006159A1
WO2004006159A1 PCT/AU2003/000878 AU0300878W WO2004006159A1 WO 2004006159 A1 WO2004006159 A1 WO 2004006159A1 AU 0300878 W AU0300878 W AU 0300878W WO 2004006159 A1 WO2004006159 A1 WO 2004006159A1
Authority
WO
WIPO (PCT)
Prior art keywords
peak
peaks
mass
monoisotopic
spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/AU2003/000878
Other languages
French (fr)
Inventor
Edmond Joseph Breen
Femia G. Hopwood
Marc R. Wilkins
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Proteome Systems Intellectual Property Pty Ltd
Original Assignee
Proteome Systems Intellectual Property Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2002950103A external-priority patent/AU2002950103A0/en
Priority claimed from AU2002950064A external-priority patent/AU2002950064A0/en
Application filed by Proteome Systems Intellectual Property Pty Ltd filed Critical Proteome Systems Intellectual Property Pty Ltd
Priority to AU2003243824A priority Critical patent/AU2003243824A1/en
Publication of WO2004006159A1 publication Critical patent/WO2004006159A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/26Mass spectrometers or separator tubes
    • H01J49/34Dynamic spectrometers
    • H01J49/40Time-of-flight spectrometers

Definitions

  • This invention relates generally to a method of picking peaks from mass spectra.
  • the invention relates to a system and method for automatically picking monoisotopic peaks from spectra generated on matrix-assisted laser desorption/ionisation time of flight (MALDI-TOF) mass spectrometers.
  • MALDI-TOF matrix-assisted laser desorption/ionisation time of flight
  • Mass spectrometers are being increasingly applied to protein identification and characterisation in proteomics. Many of these mass spectrometers are now offered with an advanced capacity for automatic analysis of samples, bringing true high throughput capacity to the field.
  • MALDI-TOF mass spectrometers from Kratos and Bruker are equipped with 384 sample targets. Typically, a plate containing this number of samples can be analysed automatically overnight, to yield a parent ion mass spectrum for each sample.
  • Some MALDI-TOF instruments namely those equipped with curved field reflectrons, are also capable of undertaking automatic post source decay (PSD) analysis of suitable peptides identified in the parent ion spectra.
  • PSD automatic post source decay
  • the present invention seeks to provided improvements in the identification of monoisotopic peak masses in mass spectra, particularly peptide mass spectra which addresses and alleviates some or all of the drawbacks of the prior art.
  • Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed in Australia before the priority date of each claim of this application.
  • the invention provides an improved method of re-sampling and reprocessing peptide mass spectra and a mathematical means to deal with "alignment error" during fitting of identified peaks to a Poisson intensity model.
  • re-sampling is carried out by applying a "nearest neighbour" interpolation.
  • a mathematical morphology filter is applied.
  • the step of compensating for alignment error may involve, for each candidate monoisotopic position, inspecting the upstream position for potential isotopes by collecting at intervals the largest stick value within a specified alignment error range.
  • the invention also provides a post-processing heuristic for checking resulting monoisotopic peak lists for error.
  • the step of checking resulting monoisotopic peak lists for error may involve further scanning to identify peaks that are within a few Daltons of each other and applying a specific criteria to eliminate noise.
  • the methods of the present invention may include the steps of: conversion of mass spectra into stick representations; the application of Poisson modelling of theoretical isotope distribution to derive the monoisotopic mass from an isotopically resolved group of peaks. More specifically, the present invention provides a method for automatically picking monoisotopic peaks from mass spectrum data comprising the steps of:- applying nearest-neighbour interpolation to resample the data while preserving peak intensity information; applying mathematical morphology to clean the spectrum and to remove any small irrelevant peaks; extracting regional maxima from the cleaned spectra, being points or groups of points that have intensity values that are greater than their neighbouring points and performing a one dimensional watershed segmentation on the spectrum using the extracted regional maxima as seeds, so as to obtain isolated peaks; reducing the data to a stick representation by replacing each isolated peak by a stick located at the centroid determined at a percentage of the peak's maximum height; compensating for alignment error by collecting at IDa intervals the largest stick value within a specified error range ⁇
  • the spectra are generated on matrix-assisted laser desorption/ionisation time of flight (MALDI-TOF) mass spectrometers.
  • MALDI-TOF matrix-assisted laser desorption/ionisation time of flight
  • the percentage is preferably from 50% to 100% of the peak's maximum height, most preferably from 65% to 75% of the peak's maximum height.
  • a method for identifying monoisotopic peaks from mass spectrum data comprising the steps of:- applying interpolation to resample the data while preserving peak intensity information; applying mathematical morphology to clean the spectrum and to remove any small irrelevant peaks; extracting regional maxima from the cleaned spectra, being points or groups of points that have intensity values that are greater than their neighbouring points and performing a one dimensional watershed segmentation on the spectrum using the extracted regional maxima as seeds, so as to obtain isolated peaks; selecting all peaks that have a width greater than about 1 Dalton and are above a predetermined noise level; finding the location of the maximum for each peak; and given the peaks mass, fitting a Poisson distribution such that the distribution's maximum intensity aligns with the maxima of the peak; and deriving the mass of the monoisotope from the distribution. compensating for alignment error by collecting at IDa intervals the largest stick value within a specified error range ⁇ e centred at each Da position with respect to the monoisotopic peak's
  • Figure 1 illustrates examples of interpolation schemes, where solid dots represent original data points and crosses represent sampled points
  • Figure 2 shows examples of nearest-neighbour interpolation on a 2800 Dalton peptide distribution
  • Figure 3 shows an example of openings and closings with a line of length r
  • Figures 4a and 4b show an example of background subtraction from a MALDI- TOF spectra
  • Figure 5 shows an estimated noise level in a region of a MALDI-TOF spectra
  • Figure 6 shows a watershed segmentation of spectra to isolate major peaks
  • Figure 7 shows spectra processed into "sticks"
  • Figure 8 shows Poisson models for deamidation for mass 3410 Da, with the top signal representing the monoisotopic peptide, the middle signal representing the deamidated version and the bottom signal represents the summation of the top and middle signals;
  • Figure 9 illustrates alignment error; the peak at position 0 represent a monoisotopic peak, and the locations for isotopes at 1 and 2 Daltons upstream are marked, with the error range given by e;
  • Figure 10 shows a PMF mass spectrum of Chain 1 : Integrin beta-3 identified from two dimensional electrophoresis (2DE) of human platelet proteins;
  • Figure 11 shows two sets of overlapping distributions from PMF mass spectrum of Chain 1: Integrin beta-3;
  • Figure 12 is an example of peaks harvested from different concentrations of
  • FIG. 13a and 13b illustrate the effect of alignment error upon the harvesting of spectra from PVDF membranes
  • Figure 14 is a PSD (post source decay) mass spectrum of 1791:.05 Da peptide from a tryptic digest of Chain 1 : Actin, cytoplasmic 1 or 2;
  • Figures 15a to c show PSD masses harvested from the PSD mass spectrum of the
  • Figure 16 is a PSD mass spectrum of 1130.67 Da peptide from tryptic digest of Beta Tublin 1, Chain VI;
  • Figure 17 shows PSD masses harvested from the PSD mass spectrum of the 1130.67 Da peptide from a tryptic digest of Beta tubulin 1, class NI, (3J543J19.4
  • IDE One Dimensional Gel Electrophoresis (IDE) of BS A Titrated amounts of Serum albumin from Bos taurus (BSA) were prepared in
  • Enzymatic digestions were carried out directly on proteins blotted onto PNDF membranes.
  • IDE BSA proteins were first electroblotted onto Immobilon-P ⁇ PNDF membranes (Millipore, Bedford, MA), using a prototype ElectrophoretlQTM electroblotting apparatus (PSL, Sydney, Australia). Electroblotting was carried out at 400 mA for 1.3 hr applying methods described by Khyse- Anderson [6] followed by protein staining using Direct Blue 71 (Sigma-Aldrich, St. Louis, MO).
  • the PNDF membranes were then adhered to an Axima-CFR MALDI target plate (Kratos, Manchester, UK) and tryptic digestion carried out upon the protein bands using Porcine trypsin (Sigma-Aldrich, St. Louis, MO) followed by matrix addition ( ⁇ -cyano-4- hydroxycinnamic acid) to the resultant peptides as described in [7]. All dispensing of chemicals to the membrane was carried out using a an ⁇ -version Chemical Printer jointly developed by Proteome Systems Ltd. (PSL) (Sydney, Australia) and Shimadzu- Biotech (Kyoto, Japan).
  • Human platelets were purchased from the Red Cross Blood Bank (Sydney, Australia). Contaminating red blood cells were removed from the platelets by centrifugation at 200 x g for 10 minutes at 4°C. The platelet-rich plasma was then centrifuged at 1500 x g for 20 minutes at 4°C. The platelet component of the pellet was gently re-suspended in 50 mM Tris-HCI, 90 mM NaCl 5 mM EDTA, pH 7.4 and washed twice more. The platelet pellet was freeze dried overnight.
  • a crude platelet membrane preparation was prepared by suspending 200 mg of lyophilized platelets in 10 ml of 100 mM sodium carbonate and sonicated at 70% intensity in a Branson Digital sonicator Model 450 four times for 15 seconds whilst keeping cool on ice. After sonication the sample was stirred for 1 hour at 4°C. The sample was centrifuged at 115000 g for 75 minutes at 4°C. The pellet was re-suspended in 50 mM Tris pH 7.3 with the assistance of an ultrasonic bath. The centrifugation and re-suspension were repeated another two times.
  • the final pellet was re-suspended in 2-5 ml of 7 M urea, 2 M thiourea 1% C7 40 mM Tris TBP added to a final concentration of 5 mM and incubated at room temperature for 1 hour. Acrylamide was added at a final concentration of 10 mM for 1 hour and a protein estimation performed to obtain a final concentration of 4 mg/ml. Before re-hydration of the IPG strips, the sample was ultra-sonicated for 2 minutes and then centrifuged at 21000 x g for 5 minutes. The supernatant was collected and 10 ⁇ l of Orange G finally added as an indicator dye.
  • protein gel pieces were excised using a prototype XciseTM system (PSL, Sydney, Australia and Shimadzu-Biotech, Kyoto, Japan) and then washed with 25 mM NH HCO 3 , pH 8.5. The gel pieces were then dehydrated under vacuum for 15 minutes and digested with 10 ⁇ L of 20 ⁇ g/mL porcine trypsin in 25 mM NH HCO 3 , pH 8.5, overnight at 30°C. Peptides were extracted from gel pieces with 10 ⁇ L of 0.5% (v/v) formic acid and sonication for 10 minutes. Prior to MALDI-TOF MS analysis, peptides were concentrated and purified using a C 18 ZipTip R (Millipore, Bedford, MA) and eluted onto a target plate in 2 ⁇ L of matrix solution and allowed to dry.
  • C 18 ZipTip R Millipore, Bedford, MA
  • Time-of-flight (TOF) mass spectrometers sample spectra, S, linearly in the time domain, t, and because TOF is proportional to the square root of the mass to charge ( ) ratio of the ions upon conversion of the spectrum to the mass domain, the data is non- evenly spaced. This is a problem because it is easier to develop analysis procedures based on evenly sampled data and in the case of TOF-MS, the mass ( ) difference between sample points is not constant. Hence., it is necessary to resample the mass spectra at even mass intervals.
  • nearest neighbour interpolation tends to be the more reliable resampling routine; particularly due to the importance of peak height information when dealing with overlapping distributions, as will be discussed in more detail below. Examples of the nearest-neighbour interpolation applied to a peptide distribution are shown in Fig 2b.
  • a mathematical morphology filter is any morphological filter that can be expressed in terms of erosions and dilations.
  • the erosion ( S) of a spectrum S with a structuring element B is denoted by:
  • the dilation ⁇ of a spectrum S with a structuring element B is denoted by:
  • FIG. 3 provides an example of an opening (a) and a closing (b). Note that an opening attenuates peaks, while a closing fills in troughs. The amount of attenuation or filling is determined by the length of the line r, as seen in Fig.3.
  • a prominent feature in raw spectra is the common presence of a background trend; where the low mass range of the spectrum does not reach the baseline illustrated in Fig. 4a.
  • N(S)(m) ⁇ f 0 if J U( K S)( ⁇ m) J - L( K S) J (m) ⁇ 0 ⁇
  • Figure 5 shows the estimated noise level in a region of MALDI-TOF spectra in which the line 10 represents the noise level, which can vary form point to point as equation 9.
  • the Poisson model is a, probability distribution that we use to relate the number of atoms, n, to proportion ? of its isotopes:
  • M the mean of the distribution; represents the product np.
  • mapping function F m -» M, where is mass is derived by deriving a hypothetical average amino acid u. - C1 0 H 16 N 3 O 3 , scaled to whole numbers, forming peptides composed of repeating units u from 1 to 15 (corresponding to peptides masses between 245.1376 and 3410.8059 D ⁇ ), and deriving the mapping function:
  • the alignment error is generally due to limitations of mass resolution, peak asymmetry and sample rate.
  • the signal upstream is inspected for potential isotopes. This is achieved by collecting at 1 Da intervals the largest stick value within 0 a specified alignment error range, + e, centred at each Da position with respect to the monoisotopic peaks mass m position. This is illustrated in Figure 9 where the peak at position 0 represents a monoisotopic peak, and the locations for isotopes at 1 and 2 Daltons upstream are marked:
  • H can be viewed as representing the weighted average height of the peaks within the distribution P.
  • the height of the distribution can then be compared with the noise level at that location in the spectrum:
  • I is an indicator function for whether the average height, of the distribution exceeds the level of the noise Nby a specified amount z. All distributions that pass the threshold are considered to be valid distributions and their monoisotopic mass is recorded and the contribution of that distribution to the entire signal is subtracted. The procedure then continues from the next position in the signal examining each further peak as a candidate monoisotopic peak. Note that this approach inherently handles overlapping distributions [1], as illustrated in Fig. 8.
  • the threshold approach described above generally picks a superset of monoisotopic peaks; depending on the chosen value of z.
  • the list of monisotopic peaks is further scanned to identify peaks that are within a few Daltons of each other. For all peaks that satisfy this condition the following criteria is used to eliminate the noise: if the distance between the monoisotopic peak sp at position (k) has a neighbour (k + 1) that is less than 3 Da away, sp(k) is a valid peak iff ⁇ > 0.2. Similarly, if the sp(k) has a neighbour (k - 1) that, is less than 3 Da away, it is a valid peak iff, ⁇ y > 0.6.
  • This treatment helps ensure that any distribution that overlaps another is not simply due to model misfitting or low signal.
  • routines of the present invention applied to peptide mass fingerprinting (PMF) data generated directly from a PNDF surface (on which blotted proteins have been digested) and in the detection of overlapping distributions from PMF data collected following 2DE of platelet proteins.
  • PMF peptide mass fingerprinting
  • the following examples also demonstrate peak harvesting of post source decay (PSD) data collected following 2DE of platelet proteins.
  • a MALDI-TOF mass spectrum was collected from a coomassie stained protein spot isolated from a 2DE of human platelet proteins, shown in Fig. 10.
  • a peak list from the resultant mass spectrum was processed using a maximum alignment error of 0.1 Da (the optimum error for PMF data).
  • the peak list was searched against PMF databases using the ExPASy Peptident tool and the protein was identified as Chain 1 : Integrin Beta,-3 (Accession number P05106). with 26 peptides matched, covering 31%, of the protein sequence.
  • Fig. 10 contains a large number of tryptic peptide peaks, resulting in a number of regions of overlapping distributions.
  • An expanded view of two of these regions are illustrated in Figures 11a and 1 lb and in both cases the method of the present invention identifies multiple isotopic distributions.
  • the monoisotopic masses derived and corresponding peptide sequences are shown in Table 1 . All five sequences were matched to tryptic peptides of Chain 1 : Integrin Beta-3.
  • Fig. 11a two overlapping distributions (1221.59, 1223.58
  • the BSA data was re-harvested with an alignment error of 0.1 Da.
  • all peptides were correctly harvested with the exception of the 1193.68 Da peptide in the 12.5 fmol sample, as shown in Figure 13.
  • the distribution of the 1193.69 Da peptide did not fall within the 0.1 Da alignment error and this peptide was rejected as a potential monoisotopic candidate.
  • the next mass at 1194.80 Da was then considered as a potential monoisotopic candidate and in this case, the isotopic distribution fitted the Poisson distribution; therefore incorrectly assigning the 1194.80 Da peptide as a monoisotopic mass.
  • the system described herein is adequately flexible to allow peak harvesting parameters to be optimised and preset for each type of system being analysed (ie. our standard procedure is to harvest MALDI-TOF data with an alignment error of 0.1 Da and PSD data with an alignment error of 0.2 Da), therefore removing the need for users to search with multiple parameter combinations for each sample.
  • the final example of the method illustrates its ability to harvest peaks from PSD mass spectra.
  • these spectra tend to be more complicated than PMF data because the spectra generally includes a lot of very small broad peaks and, due to a broader range of laser powers used, isotopic resolution varies from well resolved to poorly resolved (or completely unresolved) peptide fragments.
  • the two forms of the Chain 1 Actin identified. P02570 and P02571 differ only in the starting sequence of the first 9 residues. However as neither of these residues was identified in the PMF analysis, we can not distinguish between the presence of either (or both) proteins.
  • PSD was carried out on one peptide from each protein to confirm the identification (see Table 2). In the PSD analysis of Actin, fragmentation was carried out on the 1791.05 Da peptide, resulting in broad fragments with almost no isotopic resolution. These fragment peaks were detected using the method where the harvesting parameters have been previously optimised for PSD data .
  • the PSD mass spectrum shown in Fig. 14 shows a large number of PSD fragments. A few examples demonstrating the successful harvesting of these unresolved PSD fragments are shown in Figurel5.
  • the resultant peak list from the peak harvester was analysed using an in house PSD database search engine. The results showed that the peptide fragments matched to 28 b and y fragments of the peptide sequence SYELPDGQVITIGNER of the protein Actin (cytoplasmic 1 or 2).
  • the PSD data is better resolved and therefore provides a good comparison of the ability of the method to successfully harvest PSD Spectra despite large variability in the nature of that data.
  • PSD analysis was carried out on the 1130.67 Da peptide from Beta tubulin 1, class VI. dJ543J19A and the resultant spectrum was harvested with the same parameters as used for the Actin example above. Examples of the fragments harvested with Peak Harvester are shown in Fig. 17.
  • the peaks at masses 429 and 982 include some resolution of the second isotope.
  • the method has successfully derived the near monoisotopic masses (within IDa of the database mass). The resultant peaks were searched with an in house PSD database search engine resulting in the identification of 17 b and y fragments from the protein sequence FPGQLNADLR of the protein Beta tubulin 1, class VI, dJ543J19.4.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

A method for picking monoisotopic peaks from mass spectrometer data includes applying nearest-neighbour interpolation, removing noise, extracting regional maxima, applying a watershed segmentation to obtain isolated peaks, reducing the data to a stick representation based on the peak centroids or peak maxima, compensating for alignment error with respect to the monoisotopic peak position, fitting a Poisson model to the isotope represented by each stick value, subtracting the model distribution from the signal, and examining each further remaining peak as a candidate monoisotopic peak. The method is applicable to the case of overlapping isotopic distributions, and allows high throughput protein identification via peptide mass fingerprinting and post source decay data from MALDI-TOF mass spectra.

Description

"Method and system for picking peaks from mass spectra" Field of the Invention
This invention relates generally to a method of picking peaks from mass spectra. In particular, the invention relates to a system and method for automatically picking monoisotopic peaks from spectra generated on matrix-assisted laser desorption/ionisation time of flight (MALDI-TOF) mass spectrometers.
Background of the Invention
Mass spectrometers are being increasingly applied to protein identification and characterisation in proteomics. Many of these mass spectrometers are now offered with an advanced capacity for automatic analysis of samples, bringing true high throughput capacity to the field. For example, MALDI-TOF mass spectrometers from Kratos and Bruker are equipped with 384 sample targets. Typically, a plate containing this number of samples can be analysed automatically overnight, to yield a parent ion mass spectrum for each sample. Some MALDI-TOF instruments, namely those equipped with curved field reflectrons, are also capable of undertaking automatic post source decay (PSD) analysis of suitable peptides identified in the parent ion spectra. Together, these analyses can generate many hundreds of mass spectra per instrument per day, which forms a massive task to interpret manually. Unfortunately, the situation is even more challenging in LC-MSMS instruments, which can generate thousands of spectra per day if long runs and high scanning frequencies are used. Clearly, there is a requirement for the automation of spectral analysis, as well as acquisition, to take advantage of the throughput that is currently available.
A number of groups, have recently described methods to automatically process peptide mass spectra to yield monoisotopic masses. Such methods are described in a number of papers including "Automatic Poisson peak harvesting for high throughput protein identification" by Breen E.J. et al in Electrophoresis, 21 :2243-2251, 2000 [1];
"Improving protein identification from peptide mass fingerprinting through a
~ parameterized multi-level scoring algorithm and an optimised peak detection", by R. Gras et al in Electrophoresis, 20:3535-3550, 1999 [2]; "Isotopic de-convolution of matrix assisted laser desorption ionisation mass spectra for substance-class specific analysis of complex samples" by M. Wehofsky et al, in Eur. J. Mass Spectrom., 37:223-229, 2002[3]; . "Automated deconvolution and deisotoping of electrospray mass spectra" by M Wehofsky et al in J. Mass. Spectrom,, 37:223-229, 2002 [4]; and a paper by D. M. Horn et al in J Am Mass Spectrom., 11:320, 2000 [5], A variety of approaches have been used by the different groups, which generally involve the background removal of noise in spectra, approaches for finding the peaks, a means of modelling isotopic ratios between the peaks in an isotopic cluster, and the application of this model to the processed spectrum to identify the monoisotopic peaks.
There are a number of drawbacks with various of these approaches. For example, they do not generally cope with large masses, overlapping peptides or deamidation. The utility of some of the approaches is also questionable.
The present invention seeks to provided improvements in the identification of monoisotopic peak masses in mass spectra, particularly peptide mass spectra which addresses and alleviates some or all of the drawbacks of the prior art. Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed in Australia before the priority date of each claim of this application.
Summary of the Invention
Broadly, the invention provides an improved method of re-sampling and reprocessing peptide mass spectra and a mathematical means to deal with "alignment error" during fitting of identified peaks to a Poisson intensity model.
In one particularly preferred embodiment, re-sampling is carried out by applying a "nearest neighbour" interpolation.
Preferably, following re-sampling, a mathematical morphology filter is applied. The step of compensating for alignment error may involve, for each candidate monoisotopic position, inspecting the upstream position for potential isotopes by collecting at intervals the largest stick value within a specified alignment error range.
In a related aspect, the invention also provides a post-processing heuristic for checking resulting monoisotopic peak lists for error.
The step of checking resulting monoisotopic peak lists for error may involve further scanning to identify peaks that are within a few Daltons of each other and applying a specific criteria to eliminate noise.
Broadly, the methods of the present invention may include the steps of: conversion of mass spectra into stick representations; the application of Poisson modelling of theoretical isotope distribution to derive the monoisotopic mass from an isotopically resolved group of peaks. More specifically, the present invention provides a method for automatically picking monoisotopic peaks from mass spectrum data comprising the steps of:- applying nearest-neighbour interpolation to resample the data while preserving peak intensity information; applying mathematical morphology to clean the spectrum and to remove any small irrelevant peaks; extracting regional maxima from the cleaned spectra, being points or groups of points that have intensity values that are greater than their neighbouring points and performing a one dimensional watershed segmentation on the spectrum using the extracted regional maxima as seeds, so as to obtain isolated peaks; reducing the data to a stick representation by replacing each isolated peak by a stick located at the centroid determined at a percentage of the peak's maximum height; compensating for alignment error by collecting at IDa intervals the largest stick value within a specified error range ± e centred at each Da position with respect to the monoisotopic peak's mass position, each stick value representing an isotope fitting a Poisson model to each isotope thus collected; and treating all distributions which whose average height exceeds a noise level by a specified amount as valid; subtracting the contribution of that distribution from the overall signal; and repeating the process from the next position in the signal examining each further peak as a candidate monoisotopic peak.
Typically the spectra are generated on matrix-assisted laser desorption/ionisation time of flight (MALDI-TOF) mass spectrometers.
The percentage is preferably from 50% to 100% of the peak's maximum height, most preferably from 65% to 75% of the peak's maximum height.
In a related aspect there is provided a method for identifying monoisotopic peaks from mass spectrum data comprising the steps of:- applying interpolation to resample the data while preserving peak intensity information; applying mathematical morphology to clean the spectrum and to remove any small irrelevant peaks; extracting regional maxima from the cleaned spectra, being points or groups of points that have intensity values that are greater than their neighbouring points and performing a one dimensional watershed segmentation on the spectrum using the extracted regional maxima as seeds, so as to obtain isolated peaks; selecting all peaks that have a width greater than about 1 Dalton and are above a predetermined noise level; finding the location of the maximum for each peak; and given the peaks mass, fitting a Poisson distribution such that the distribution's maximum intensity aligns with the maxima of the peak; and deriving the mass of the monoisotope from the distribution. compensating for alignment error by collecting at IDa intervals the largest stick value within a specified error range ± e centred at each Da position with respect to the monoisotopic peak's mass position, each stick value representing an isotope.
Brief Description of the Drawings
Specific embodiments of the invention will now be described by way of example only, and with reference to the accompanying drawings in which:
Figure 1 illustrates examples of interpolation schemes, where solid dots represent original data points and crosses represent sampled points;
Figure 2 shows examples of nearest-neighbour interpolation on a 2800 Dalton peptide distribution;
Figure 3 shows an example of openings and closings with a line of length r;
Figures 4a and 4b show an example of background subtraction from a MALDI- TOF spectra;
Figure 5 shows an estimated noise level in a region of a MALDI-TOF spectra;
Figure 6 shows a watershed segmentation of spectra to isolate major peaks;
Figure 7 shows spectra processed into "sticks";
Figure 8 shows Poisson models for deamidation for mass 3410 Da, with the top signal representing the monoisotopic peptide, the middle signal representing the deamidated version and the bottom signal represents the summation of the top and middle signals;
Figure 9 illustrates alignment error; the peak at position 0 represent a monoisotopic peak, and the locations for isotopes at 1 and 2 Daltons upstream are marked, with the error range given by e;
Figure 10 shows a PMF mass spectrum of Chain 1 : Integrin beta-3 identified from two dimensional electrophoresis (2DE) of human platelet proteins;
Figure 11 shows two sets of overlapping distributions from PMF mass spectrum of Chain 1: Integrin beta-3; Figure 12 is an example of peaks harvested from different concentrations of
BSA electroblotted from a IDE to a PNDF membrane; Figures 13a and 13b illustrate the effect of alignment error upon the harvesting of spectra from PVDF membranes;
Figure 14 is a PSD (post source decay) mass spectrum of 1791:.05 Da peptide from a tryptic digest of Chain 1 : Actin, cytoplasmic 1 or 2; Figures 15a to c show PSD masses harvested from the PSD mass spectrum of the
1791.05 Da peptide of Chain 1: Actin, cytoplasmic 1 or 2;
Figure 16 is a PSD mass spectrum of 1130.67 Da peptide from tryptic digest of Beta Tublin 1, Chain VI; and
Figure 17 shows PSD masses harvested from the PSD mass spectrum of the 1130.67 Da peptide from a tryptic digest of Beta tubulin 1, class NI, (3J543J19.4
Detailed Description of Preferred Embodiments
Example
One Dimensional Gel Electrophoresis (IDE) of BS A Titrated amounts of Serum albumin from Bos taurus (BSA) were prepared in
SDS-PAGE sample buffer and reduced and alkylated (using dithiothreitol and acrylamide, respectively) for 1 hr at room temperature prior to electrophoresis. The sample was electrophoresed on 6-15% (w/v) polyacrylamide ProteoGel™ (Sigma-
Aldrich, St. Louis, MO) following the manufacturers instructions.
On-membrane Protein Digestions
Enzymatic digestions were carried out directly on proteins blotted onto PNDF membranes. IDE BSA proteins were first electroblotted onto Immobilon-P ^ PNDF membranes (Millipore, Bedford, MA), using a prototype ElectrophoretlQ™ electroblotting apparatus (PSL, Sydney, Australia). Electroblotting was carried out at 400 mA for 1.3 hr applying methods described by Khyse- Anderson [6] followed by protein staining using Direct Blue 71 (Sigma-Aldrich, St. Louis, MO). The PNDF membranes were then adhered to an Axima-CFR MALDI target plate (Kratos, Manchester, UK) and tryptic digestion carried out upon the protein bands using Porcine trypsin (Sigma-Aldrich, St. Louis, MO) followed by matrix addition (α-cyano-4- hydroxycinnamic acid) to the resultant peptides as described in [7]. All dispensing of chemicals to the membrane was carried out using a an α-version Chemical Printer jointly developed by Proteome Systems Ltd. (PSL) (Sydney, Australia) and Shimadzu- Biotech (Kyoto, Japan). Glass capillary piezoelectric devices (Microfab Technologies, Inc., Piano, TX) were used to micro-dispense all solutions which were pre-filtered through membrane filters (Millipore, Bedford, MA) prior to dispensing. Two -Dimensional Electrophoresis (2DE) of Platelet proteins In-gel digestion
Human platelets were purchased from the Red Cross Blood Bank (Sydney, Australia). Contaminating red blood cells were removed from the platelets by centrifugation at 200 x g for 10 minutes at 4°C. The platelet-rich plasma was then centrifuged at 1500 x g for 20 minutes at 4°C. The platelet component of the pellet was gently re-suspended in 50 mM Tris-HCI, 90 mM NaCl 5 mM EDTA, pH 7.4 and washed twice more. The platelet pellet was freeze dried overnight. A crude platelet membrane preparation was prepared by suspending 200 mg of lyophilized platelets in 10 ml of 100 mM sodium carbonate and sonicated at 70% intensity in a Branson Digital sonicator Model 450 four times for 15 seconds whilst keeping cool on ice. After sonication the sample was stirred for 1 hour at 4°C. The sample was centrifuged at 115000 g for 75 minutes at 4°C. The pellet was re-suspended in 50 mM Tris pH 7.3 with the assistance of an ultrasonic bath. The centrifugation and re-suspension were repeated another two times. The final pellet was re-suspended in 2-5 ml of 7 M urea, 2 M thiourea 1% C7 40 mM Tris TBP added to a final concentration of 5 mM and incubated at room temperature for 1 hour. Acrylamide was added at a final concentration of 10 mM for 1 hour and a protein estimation performed to obtain a final concentration of 4 mg/ml. Before re-hydration of the IPG strips, the sample was ultra-sonicated for 2 minutes and then centrifuged at 21000 x g for 5 minutes. The supernatant was collected and 10 μl of Orange G finally added as an indicator dye.
Next, dry 24 cm IPG strips (Amersham-Pharmacia Biotech., Uppsala, Sweden) were re-hydrated for 6 hr with 400 μl of protein sample. The re-hydrated strips were focused on a Protean IEF Cell (Bio-Rad, Hercules, CA) for 120 kVhr at a maximum of 10 kV. The focused IPG strips were equilibrated for 20 minutes in 6 M urea, 2% (w/v) SDS, 50 mM Tris-HCI, pH 7.0.
Next, the equilibrated strips were inserted into loading wells of 6-15% (w/v) tris-acetate SDS-PAGE pre-cast prototype 10 cm x 15 cm GelChips™ (PSL, Sydney, Australia). Electrophoresis was performed at 50 mA for 1.5 hr. Proteins were stained overnight using Coomassie stain G-250 destained with 1% acetic, acid and gels rehydrated in water prior to gel spot excision.
Then protein gel pieces were excised using a prototype Xcise™ system (PSL, Sydney, Australia and Shimadzu-Biotech, Kyoto, Japan) and then washed with 25 mM NH HCO3, pH 8.5. The gel pieces were then dehydrated under vacuum for 15 minutes and digested with 10 μL of 20 μg/mL porcine trypsin in 25 mM NH HCO3, pH 8.5, overnight at 30°C. Peptides were extracted from gel pieces with 10 μL of 0.5% (v/v) formic acid and sonication for 10 minutes. Prior to MALDI-TOF MS analysis, peptides were concentrated and purified using a C18 ZipTipR (Millipore, Bedford, MA) and eluted onto a target plate in 2 μL of matrix solution and allowed to dry.
MALDI analysis
PMF (Peptide Mass Fingerprinting) and PSD (Post Source Decay) spectra were collected using an Axima-CFR MALDI-TOF mass spectrometer (Kratos, Manchester,
UK) using time delayed extraction in reflectron mode. All spectra were internally two- point calibrated on trypsin autodigestion peaks or a standard peptide (ACTH) added to the matrix.
PMF and PSD database searching
In house databases and tools (PSL, Sydney, Australia) and Peptldent from the ExPASy Molecular Biology Server (http://www.expasy.ch/tools/peptident.html) were used to search the Swiss-Prot, TrEMBL and "in house" databases for PMF and PSD analysis. Mass error tolerances of 100 ppm and 1.0 Da were used for PMF and PSD data, respectively.
Results and Discussion
Pre-processing of spectra
Time-of-flight (TOF) mass spectrometers sample spectra, S, linearly in the time domain, t, and because TOF is proportional to the square root of the mass to charge ( ) ratio of the ions upon conversion of the spectrum to the mass domain, the data is non- evenly spaced. This is a problem because it is easier to develop analysis procedures based on evenly sampled data and in the case of TOF-MS, the mass ( ) difference between sample points is not constant. Hence., it is necessary to resample the mass spectra at even mass intervals.
The prior art document [1] teaches the use of linear interpolation to resample the original data, however this has been found to be somewhat unsatisfactory largely due to the underestimation of the peak heights, as shown in Figure la. Linear interpolation also leads to a smoothing of the data and even to loss of peak information. Cubic interpolation is another interpolation method that may be used. However, with cubic interpolation, as seen in Figure lb, artificially high peaks can be introduced that lead to erroneous intensity values and peak positions. In comparison, we have found that nearest, neighbour interpolation, as illustrated in Figure lc, preserves the peak intensity information, although, as with all interpolation methods; shifts in positional information can and do occur. However nearest neighbour interpolation tends to be the more reliable resampling routine; particularly due to the importance of peak height information when dealing with overlapping distributions, as will be discussed in more detail below. Examples of the nearest-neighbour interpolation applied to a peptide distribution are shown in Fig 2b.
Mathematical Morphology Following resampling, mathematical morphology is applied (see references [8],
[9], [10], the contents of which are incorporated by reference) as it allows the design of filters that accurately focus in on peak widths and on the distance between peaks. Basically, a morphological filter is any filter which is idempotent Φ(Φ(g)) = Φ(g) , increasing g > / => Φ(g) ≥ Φ(f) and is either anti-extensive Φ(g) ≤ g or extensive Φ(g) ≥ g ■ A mathematical morphology filter is any morphological filter that can be expressed in terms of erosions and dilations.
The erosion ( S) of a spectrum S with a structuring element B is denoted by:
εB(S)(x) = ^S(x + b) , (1) here the structuring element B is flat and is simply a line of some specified length, /, with the origin at its centre; that is, B = {-112,... -2,-1,0,1,2..., 1/2}.
The dilation δ of a spectrum S with a structuring element B is denoted by:
δB(S)(x) = ^S(x + b) . (2) These two basic transformations are often combined to produce openings:
Figure imgf000009_0001
and closings:
^(S) = ^(^(S)) . (4) where B = - {b : bεB} is the symmetric set of B with respect to its origin. Any closing is the dual of a particular opening and vice-versa; that is:
φB {S) = (γB (Sc)γ . (5) where fc represents the complement of f.fc(x) = tmax -f, and tmax is the maximum off. Figure 3 provides an example of an opening (a) and a closing (b). Note that an opening attenuates peaks, while a closing fills in troughs. The amount of attenuation or filling is determined by the length of the line r, as seen in Fig.3.
Spectrum Cleaning
A prominent feature in raw spectra is the common presence of a background trend; where the low mass range of the spectrum does not reach the baseline illustrated in Fig. 4a.
Typically large openings are found to be good for estimating the background level, however better results are achieved by constructing filters that alternate between closings and openings. For example an estimate of the lower envelop, L, of the spectrum can be obtained via:
L(S) = εxx+yy)) (6) where the lengths of JC, y are typically 100,11 Da respectively. While these values may seem large these transformations can be computed very efficiently [11].
The background corrected spectrum. C(S), shown in Fig. 4b, was obtained via
C(S)(m) =\ fθ if J S(m) J-L(S) J(m) J < 0 (?)
[S (m) - L(S)(m) otherwise To determine the noise level, N, in the spectrum, we first compute the upper envelop Uof the spectrum by:
U(S)=δxx+yy)) (8) and then the noise level is computed via:
N(S)(m) =\ f 0 if J U( KS)(\m) J - L( KS) J(m) < 0 ^
[US(m)-L(S)(m) otherwise Figure 5 shows the estimated noise level in a region of MALDI-TOF spectra in which the line 10 represents the noise level, which can vary form point to point as equation 9.
Spectrum Segmentation
Following the above signal cleaning, small irrelevant maxima are then removed by applying a small opening. Next, regional maxima are extracted. Regional maxima, are defined as points or groups of points that have intensity values that are strictly greater than their neighbouring points. The extracted maxima are then used as seeds for performing a one dimensional (ID) watershed segmentation [10] on the spectrum; so as to obtain isolated peaks as shown in Fig. 6b.
This information is then further reduced to a stick representations by replacing each isolated peak, Fig. 6b, by a stick at its centroid position determined at 70%, of the peaks maximum height. The stick representations are shown in Fig. 7.
Harvesting Monoisotopic peaks
From each MALDI-TOF spectrum it is necessary to extract the monoisotopic peak from each peptide isotope distribution peptide masses. The isotopic distributions arrive due to the naturally occurring presence of H2, C13, N15, Oχη and (918 elemental isotopes in the peptides analysed. To harvest the monoisotopic peaks, we use a Poisson model approach [1]. Basically, the Poisson model is a, probability distribution that we use to relate the number of atoms, n, to proportion ? of its isotopes:
Figure imgf000011_0001
where M, the mean of the distribution; represents the product np.
Since there are many isotopic distributions and we will not in general know the appropriate values of n and p, a, mapping function F : m -» M, where is mass is derived by deriving a hypothetical average amino acid u. - C10H16N3O3, scaled to whole numbers, forming peptides composed of repeating units u from 1 to 15 (corresponding to peptides masses between 245.1376 and 3410.8059 Dα), and deriving the mapping function:
F( )=0.000594 -0.03091 (11) Since we were able to capture our optimisation steps [1] into a simple regression type equation (Eq. 11) and the evaluation of Equation 10 is fast, we are able to define an average peptide isotopic distribution for any value of m, and in real time. Other techniques are considerably more involved and do not readily lend themselves to analysis of higher mass peptides [3] or are slow for real time analysis [2], and hence require further sampling and estimation procedures.
This approach could be defined by using a different representation of u and therefore extending the application of this peak harvesting tool to a, greater number of applications, including polymers and glycoproteins or any other system where a repeating unit (u) can be characterised. Fitting the Poisson model
There are two predominant variables to consider when fitting the Poisson distribution to a stick representation of a mass spectrum; Fig. 7. These are the presence of overlapping isotopic distributions; illustrated in Figure 8, and the alignment error in 5 the isotope separations, as they often deviate from exactly 1.0 Da.
The alignment error is generally due to limitations of mass resolution, peak asymmetry and sample rate. To compensate for the alignment error (c), at each candidate monoisotopic position, S(m.), the signal upstream is inspected for potential isotopes. This is achieved by collecting at 1 Da intervals the largest stick value within 0 a specified alignment error range, + e, centred at each Da position with respect to the monoisotopic peaks mass m position. This is illustrated in Figure 9 where the peak at position 0 represents a monoisotopic peak, and the locations for isotopes at 1 and 2 Daltons upstream are marked:
5 S,( =max(S(x)). (12) x=m-e
Here, S is the stick representation of the original spectrum, Fig. 7. The next step is to fit the Poisson model (Eq. 10) to each isotope k, where P(k, M) > 0.001, in the distribution for mass m:
Figure imgf000012_0001
where p(m;k) = S (m)A(k.M) ∞AA(k.M) = gg .
We now need to measure how well the extracted Poisson model P fits the theoretical Poisson model P at mass m:
, m ∑ P(k;M)P(m;k) 5 H(m)= (14) /k e {P(k,M) > 0.001}
H can be viewed as representing the weighted average height of the peaks within the distribution P. The height of the distribution can then be compared with the noise level at that location in the spectrum:
Figure imgf000012_0002
I is an indicator function for whether the average height, of the distribution exceeds the level of the noise Nby a specified amount z. All distributions that pass the threshold are considered to be valid distributions and their monoisotopic mass is recorded and the contribution of that distribution to the entire signal is subtracted. The procedure then continues from the next position in the signal examining each further peak as a candidate monoisotopic peak. Note that this approach inherently handles overlapping distributions [1], as illustrated in Fig. 8.
Post Processing: Heuristic
The threshold approach described above (Eq. 15) generally picks a superset of monoisotopic peaks; depending on the chosen value of z. In order to ensure the peaks are real the list of monisotopic peaks is further scanned to identify peaks that are within a few Daltons of each other. For all peaks that satisfy this condition the following criteria is used to eliminate the noise: if the distance between the monoisotopic peak sp at position (k) has a neighbour (k + 1) that is less than 3 Da away, sp(k) is a valid peak iff ^^ > 0.2. Similarly, if the sp(k) has a neighbour (k - 1) that, is less than 3 Da away, it is a valid peak iff, ^§y > 0.6.
This treatment helps ensure that any distribution that overlaps another is not simply due to model misfitting or low signal.
Application to data analysis Reference [1] discusses the successful application of peak picking software to low concentration standard peptide analysis and BSA in-gel digests, illustrating the fitting of the Poisson distribution to actual peptide distributions for a range of peptide masses.
In the following examples the routines of the present invention applied to peptide mass fingerprinting (PMF) data generated directly from a PNDF surface (on which blotted proteins have been digested) and in the detection of overlapping distributions from PMF data collected following 2DE of platelet proteins. The following examples also demonstrate peak harvesting of post source decay (PSD) data collected following 2DE of platelet proteins.
Analysis of PMF spectra of platelet proteins
A MALDI-TOF mass spectrum was collected from a coomassie stained protein spot isolated from a 2DE of human platelet proteins, shown in Fig. 10. A peak list from the resultant mass spectrum was processed using a maximum alignment error of 0.1 Da (the optimum error for PMF data). The peak list was searched against PMF databases using the ExPASy Peptident tool and the protein was identified as Chain 1 : Integrin Beta,-3 (Accession number P05106). with 26 peptides matched, covering 31%, of the protein sequence.
The spectrum shown in Fig. 10 contains a large number of tryptic peptide peaks, resulting in a number of regions of overlapping distributions. An expanded view of two of these regions are illustrated in Figures 11a and 1 lb and in both cases the method of the present invention identifies multiple isotopic distributions. The monoisotopic masses derived and corresponding peptide sequences (identified by Peptident) are shown in Table 1 . All five sequences were matched to tryptic peptides of Chain 1 : Integrin Beta-3. In the first example, Fig. 11a; two overlapping distributions (1221.59, 1223.58
Da) can he easily seen due to their differences in intensity. Both of these distributions were detected by method of the present invention and database searching identified the peptide masses as sequences of Chain 1: Integrin Beta-3 with no tryptic missed cleavages or other modifications. Further examples of overlapping distributions shown in Fig. l ib are found at 1531.82, 1532.83 and 1533.81 Da. These masses were again found to be tryptic peptides from Chain 1 : Integrin Beta-3 peptides where there are no tryptic missed cleavages. We believe the peptide at mass 1533.81 is due to the deamidation of one of the glutamine residues of the 1532.83 sequence. This preliminary assignment of the 1533.81 peptide requires further mass spectrometric confirmation to definitively assign the sequence (our standard practice is to use PSD analysis which cannot be applied for multiple peptide distributions within approximately 5-1 ODa of each other). Nonetheless, the detection of these three overlapping distributions highlights the ability of the method to extract complex distributions from MALDI-TOF data. Table 1 : Monoisotopic masses harvested and corresponding peptide sequences matched for the five overlapping distributions shown in Figure 11.
* Where one of the Q residues is potentially deamidated
Figure imgf000015_0001
Analysis of standard peptides on PVDF membrane MALDI analysis of peptide digests directly from PNDF membranes is inherently challenging due to the non-flat nature of the membrane surface and lack of conductivity across the membrane. Both features can lead to broader peak resolution than that obtainable with standard MALDI data collection from a flat metal target. Here, we use data collected from PNDF membranes to demonstrate the ability of the method to harvest such data, despite variation from perfect behaviour.
To account for the reduced resolution typical of peptides collected directly from a PVDF surface, the mass spectra of BSA from PVDF were all harvested with an alignment error of 0.2 Da. BSA peptides were generated from tryptic digests on bands of BSA blotted from IDE onto a PVDF membrane. The IDE wells were loaded with BSA protein concentrations of 500, 250 and 100 fmol. Less than one eighth of the protein band was digested on the membrane, the equivalent of 62.5. 31.2 and 12.5 fmol of whole protein, respectively. Figure 12 shows the peaks of masses 1193.68, 1439.85 and 1479.89 Da after processing with peak harvester. The theoretical Poisson distribution is drawn inside the actual distribution of the mass spectrum. Note that the left most peak in each case is the monoisotopic mass.
For the 62.5 fmol sample (Fig. 12, upper row) all peaks have high signal to noise ratios with near baseline separation. The actual isotopic distributions map well to the Poisson distributions. At lower concentrations, for example the harvesting of the 1439.85 Da peak for the 12.5 fmol sample (Fig. 12. bottom row), the actual isotopic distribution starts to deviate from the Poisson distribution, although in this case; it is still an acceptable match for successful harvesting.
To assess the importance of the alignment error; the BSA data was re-harvested with an alignment error of 0.1 Da. In this case; all peptides were correctly harvested with the exception of the 1193.68 Da peptide in the 12.5 fmol sample, as shown in Figure 13. In this case, the distribution of the 1193.69 Da peptide did not fall within the 0.1 Da alignment error and this peptide was rejected as a potential monoisotopic candidate. The next mass at 1194.80 Da was then considered as a potential monoisotopic candidate and in this case, the isotopic distribution fitted the Poisson distribution; therefore incorrectly assigning the 1194.80 Da peptide as a monoisotopic mass. Fortunately; the system described herein is adequately flexible to allow peak harvesting parameters to be optimised and preset for each type of system being analysed (ie. our standard procedure is to harvest MALDI-TOF data with an alignment error of 0.1 Da and PSD data with an alignment error of 0.2 Da), therefore removing the need for users to search with multiple parameter combinations for each sample.
Analysis of PSD spectra of platelet proteins
The final example of the method illustrates its ability to harvest peaks from PSD mass spectra. In some ways these spectra tend to be more complicated than PMF data because the spectra generally includes a lot of very small broad peaks and, due to a broader range of laser powers used, isotopic resolution varies from well resolved to poorly resolved (or completely unresolved) peptide fragments.
The processing of this data is also different than that derived for PMF analysis. Unlike the approach used with PMF data, where the Poisson model is fitted to a stick representation of the spectra as shown in Fig. 7, for PSD the spectra is not reduced to sticks after segmentation (Section 3.4) but instead all peaks are selected which are wider than 1 Dalton and above the noise level (Eq. 9). From these peaks, we find the location of their maximum and given their mass the then fit a Poisson distribution such that the distributions maximum intensity aligns with the maxima, of the PSD peak, see Figures 15 and 17. Then the mass of the monoisotope is easily derived from this distribution By taking the mass of the distribution's monoisotope.
PMF analysis was carried out on two more protein spots from 2DE of platelet proteins. The database searching results indicated that the two protein spots analysed contained Chain 1: Actin; cytoplasmic 1 or 2 (accession number P02570 and P02571, respectively, molecular weight 41605 Da. pi 5.29) and Beta tubulin 1, class VI, dJ543J19.4 (accession number Q9H4B7, molecular weight 50296 Da. pi 5.05) as summarised in Table 2.
The two forms of the Chain 1: Actin identified. P02570 and P02571 differ only in the starting sequence of the first 9 residues. However as neither of these residues was identified in the PMF analysis, we can not distinguish between the presence of either (or both) proteins. PSD was carried out on one peptide from each protein to confirm the identification (see Table 2). In the PSD analysis of Actin, fragmentation was carried out on the 1791.05 Da peptide, resulting in broad fragments with almost no isotopic resolution. These fragment peaks were detected using the method where the harvesting parameters have been previously optimised for PSD data . The PSD mass spectrum; shown in Fig. 14 shows a large number of PSD fragments. A few examples demonstrating the successful harvesting of these unresolved PSD fragments are shown in Figurel5.
Table 2: Results of the PMF and corresponding PSD analysis of two proteins derived from 2DE of human platelets
Figure imgf000017_0001
The resultant peak list from the peak harvester was analysed using an in house PSD database search engine. The results showed that the peptide fragments matched to 28 b and y fragments of the peptide sequence SYELPDGQVITIGNER of the protein Actin (cytoplasmic 1 or 2).
In the second example, Fig. 16, the PSD data is better resolved and therefore provides a good comparison of the ability of the method to successfully harvest PSD Spectra despite large variability in the nature of that data. PSD analysis was carried out on the 1130.67 Da peptide from Beta tubulin 1, class VI. dJ543J19A and the resultant spectrum was harvested with the same parameters as used for the Actin example above. Examples of the fragments harvested with Peak Harvester are shown in Fig. 17. Here, the peaks at masses 429 and 982 include some resolution of the second isotope. The method has successfully derived the near monoisotopic masses (within IDa of the database mass). The resultant peaks were searched with an in house PSD database search engine resulting in the identification of 17 b and y fragments from the protein sequence FPGQLNADLR of the protein Beta tubulin 1, class VI, dJ543J19.4.
Thus the present invention provides improved tools and methods for selecting monoisotopic peaks from data collected directly from PVDF membranes and the preferred inclusion of post source decay harvesting produces improved results. It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.
References
Breen E.J. , Hopwood F.G., Williams K.L., and Wilkins M.R. Automatic Poisson peak harvesting for high throughput protein identification. Electrophoresis, 21:2243-2251;
2000. [2] R.. Gras, Muller M., Gasteiger E., Gay S., Binz P.-A., Bienvenut W., Hoogland C,
Sanchez J.-C, Bairoch A., Hochstrasser D.F., and Appel R.D. Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection. Electrophoresis, 20:3535-3550,
1999. [3] M. Wehofsky, R Hoffmann, M Hubert, and Spengler B. Isotopic deconvolution of matrix-assisted laser desorption ionisation mass spectra for substance-class specific analysis of complex samples. Eur. J. Mass Spectrom, 7:39-4G. 2001.
[4] M. Wehofsky and R. Hoffman. Automated deconvolution and deisotoping of electrospray mass spectra.. J. Mass. Spectrom., 37:223-229, 2002. [5] D.M. Horn, Zubarev R.A., and Mclafferty F.W. J. Am. Soc. Mass Spectrom.,
11:320. 2000.
[6] Kyhse- Andersen J. Electroblotting of multiple gels: a simple apparatus without buffer tank for rapid transfer of proteins from polyacrylamide to nitrocellulose. J.
Biochem. Biophys. Methods, 10:203-209; 1984. [7] A. Sloane. High-throughput peptide mass fingerprinting and protein macroarray analysis using chemical printing strategies. Mol. Cell. Proteomics; page In press, 2002.
[8] G. Matheron. Random sets and integral geometry. John Wiley, New York, NY, USA,
1975.
[9] J. Serra. Image Analysis and Mathematical Morphology. Academic Press; New York, NY, USA, 1982.
[10] P. Soille. Morphological Image Analysis: Principles and Applications. Springer-
Verlag, Berlin. 1999.
[11] P. Soille; E. Breen, and R. Jones. A fast algorithm for min/max filters along lines of arbitrary orientation. In I. Pitas, editor. IEEE Workshop on nonlinear signal and image processing, volume II, pages 987-990, Neos Marmaras, June 1995.

Claims

CLAIMS:
1. A method for identifying monoisotopic peaks from mass spectrum data comprising the steps of:- applying interpolation to resample the data while preserving peak intensity information; applying mathematical morphology to clean the spectrum and to remove any small irrelevant peaks; extracting regional maxima from the cleaned spectra, being points or groups of points that have intensity values that are greater than their neighbouring points and performing a one dimensional watershed segmentation on the spectrum using the extracted regional maxima as seeds, so as to obtain isolated peaks; reducing the data to a stick representation by replacing each isolated peak by a stick located at the centroid determined at a percentage of the peak's maximum height; compensating for alignment error by collecting at IDa intervals the largest stick value within a specified error range ± e centred at each Da position with respect to the monoisotopic peak's mass position, each stick value representing an isotope fitting a Poisson model to each isotope thus collected; and treating all distributions which whose average height exceeds a noise level by a specified amount as valid; subtracting the contribution of that distribution from the overall signal; and repeating the process from the next position in the signal examining each further peak as a candidate monoisotopic peak.
2. A method as claimed in claim 1 wherein the step of applying interpolation involves applying nearest neighbour interpolation.
3. A method as claimed in claim 1 or claim 2 wherein the step of treating all distributions which whose average height exceeds a noise level by a specified amount as valid involves the steps of determining H, the weighted average height of the peaks within the distribution as follows
„, , ∑ P(k;M)P(m;k) H(m) = /k e {P(k,M) > 0.001} and comparing the height H of the distribution can then be compared with the noise level represented by I(m) at that location in the spectrum as follows:
Figure imgf000020_0001
where I is an indicator function for whether the average height of the distribution exceeds the level of noise N.
4. A method as claimed in any preceding claim wherein the spectra are generated on matrix-assisted laser desorption ionisation time of flight (MALDI-TOF) mass spectrometers.
5. A method as claimed in any preceding claim wherein the percentage is from 50% to 100% of the peak's maximum height, most preferably from 65% to 75% of the peak's maximum height.
6. A method for identifying monoisotopic peaks from mass spectrum data comprising the steps of:- applying interpolation to resample the data while preserving peak intensity information; applying mathematical morphology to clean the spectrum and to remove any small irrelevant peaks; extracting regional maxima from the cleaned spectra, being points or groups of points that have intensity values that are greater than their neighbouring points and performing a one dimensional watershed segmentation on the spectrum using the extracted regional maxima as seeds, so as to obtain isolated peaks; selecting all peaks that have a width greater than about 1 Dalton and are above a predetermined noise level; finding the location of the maximum for each peak; and given the peaks mass, fitting a Poisson distribution such that the distribution's maximum intensity aligns with the maxima of the peak; and deriving the mass of the monoisotope from the distribution. compensating for alignment error by collecting at IDa intervals the largest stick value within a specified error range + e centred at each Da position with respect to the monoisotopic peak's mass position, each stick value representing an isotope.
7. A method as claimed in claim 4 wherein the step of applying interpolation includes applying a nearest neighbour type interpolation.
8. A system for identifying monoisotopic peaks from mass spectrum data comprising: means for applying interpolation to resample the data while preserving peak intensity information; means for applying mathematical morphology to clean the spectrum and to remove any small irrelevant peaks; means for extracting regional maxima from the cleaned spectra, being points or groups of points that have intensity values that are greater than their neighbouring points and performing a one dimensional watershed segmentation on the spectrum using the extracted regional maxima as seeds, so as to obtain isolated peaks; means for reducing the data to a stick representation by replacing each isolated peak by a stick located at the centroid determined at a percentage of the peak's maximum height; means for compensating for alignment error by collecting at IDa intervals the largest stick value within a specified error range ± e centred at each Da position with respect to the monoisotopic peak's mass position, each stick value representing an isotope means for fitting a Poisson model to each isotope thus collected; and means for treating all distributions which whose average height exceeds a noise level by a specified amount as valid; and means for subtracting the contribution of that distribution from the overall signal and repeating the process from the next position in the signal for examining each further peak as a candidate monoisotopic peak.
PCT/AU2003/000878 2002-07-08 2003-07-08 Method and system for picking peaks for mass spectra Ceased WO2004006159A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003243824A AU2003243824A1 (en) 2002-07-08 2003-07-08 Method and system for picking peaks for mass spectra

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
AU2002950103 2002-07-08
AU2002950103A AU2002950103A0 (en) 2002-07-08 2002-07-08 Method and system for picking peaks for mass spectra
AU2002950064 2002-07-09
AU2002950064A AU2002950064A0 (en) 2002-07-09 2002-07-09 Method and system for picking peaks for mass spectra

Publications (1)

Publication Number Publication Date
WO2004006159A1 true WO2004006159A1 (en) 2004-01-15

Family

ID=30116380

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2003/000878 Ceased WO2004006159A1 (en) 2002-07-08 2003-07-08 Method and system for picking peaks for mass spectra

Country Status (1)

Country Link
WO (1) WO2004006159A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2445578A (en) * 2007-01-15 2008-07-16 Symetrica Ltd Radioactive isotope identification
DE102010011974A1 (en) * 2010-03-19 2011-09-22 Bruker Daltonik Gmbh Saturation correction for ion signals in time-of-flight mass spectrometers
CN103197017A (en) * 2013-05-02 2013-07-10 云南烟草科学研究院 Cigarette smoke colour spectrum baseline correction MPLS (mathematically proven learning system) method
CN106530838A (en) * 2016-10-25 2017-03-22 合肥飞友网络科技有限公司 Aircraft flight path jump removing method
CN110669104A (en) * 2019-10-30 2020-01-10 上海交通大学 Group of markers derived from human peripheral blood mononuclear cells and application thereof
CN115241035A (en) * 2022-07-13 2022-10-25 安图实验仪器(郑州)有限公司 A mass axis correction method, device and medium thereof
CN119246749A (en) * 2024-10-22 2025-01-03 浙江工业大学 A method for centroid conversion of UHPLC-HRMS profile mode data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001067485A1 (en) * 2000-03-07 2001-09-13 Amersham Biosciences Ab Mass spectral peak identification

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001067485A1 (en) * 2000-03-07 2001-09-13 Amersham Biosciences Ab Mass spectral peak identification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BREEN E.J. ET AL.: "Automated peak harvesting of MALDI-MS spectra for high throughput proteomics", SPECTROSCOPY, vol. 17, 2003, pages 579 - 595 *
BREEN E.J. ET AL.: "Automatic poisson peak harvesting for high throughput protein identification", ELECTROPHORESIS, vol. 21, 2000, pages 2243 - 2251 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8374993B2 (en) 2007-01-15 2013-02-12 Symetrica Limited Radioactive isotope identification
GB2445578B (en) * 2007-01-15 2009-01-07 Symetrica Ltd Radioactive isotope identification
GB2445578A (en) * 2007-01-15 2008-07-16 Symetrica Ltd Radioactive isotope identification
DE102010011974B4 (en) * 2010-03-19 2016-09-15 Bruker Daltonik Gmbh Saturation correction for ion signals in time-of-flight mass spectrometers
US9324544B2 (en) 2010-03-19 2016-04-26 Bruker Daltonik Gmbh Saturation correction for ion signals in time-of-flight mass spectrometers
DE102010011974A1 (en) * 2010-03-19 2011-09-22 Bruker Daltonik Gmbh Saturation correction for ion signals in time-of-flight mass spectrometers
US11373848B2 (en) 2010-03-19 2022-06-28 Bruker Daltonik Gmbh Saturation correction for ion signals in time-of-flight mass spectrometers
CN103197017A (en) * 2013-05-02 2013-07-10 云南烟草科学研究院 Cigarette smoke colour spectrum baseline correction MPLS (mathematically proven learning system) method
CN106530838A (en) * 2016-10-25 2017-03-22 合肥飞友网络科技有限公司 Aircraft flight path jump removing method
CN106530838B (en) * 2016-10-25 2019-04-09 飞友科技有限公司 A method for removing jumps in aircraft flight trajectory
CN110669104A (en) * 2019-10-30 2020-01-10 上海交通大学 Group of markers derived from human peripheral blood mononuclear cells and application thereof
CN110669104B (en) * 2019-10-30 2021-11-05 上海交通大学 A group of markers derived from human peripheral blood mononuclear cells and their applications
CN115241035A (en) * 2022-07-13 2022-10-25 安图实验仪器(郑州)有限公司 A mass axis correction method, device and medium thereof
CN115241035B (en) * 2022-07-13 2025-06-03 安图实验仪器(郑州)有限公司 A mass axis correction method, device and medium thereof
CN119246749A (en) * 2024-10-22 2025-01-03 浙江工业大学 A method for centroid conversion of UHPLC-HRMS profile mode data

Similar Documents

Publication Publication Date Title
Yergey et al. De novo sequencing of peptides using MALDI/TOF-TOF
Carpentier et al. Preparation of protein extracts from recalcitrant plant tissues: an evaluation of different methods for two‐dimensional gel electrophoresis analysis
Zhang et al. A universal algorithm for fast and automated charge state deconvolution of electrospray mass-to-charge ratio spectra
Chaurand et al. Profiling proteins from azoxymethane‐induced colon tumors at the molecular level by matrix‐assisted laser desorption/ionization mass spectrometry
US9390897B2 (en) Mass spectrometry
Zhang et al. ProbIDtree: an automated software program capable of identifying multiple peptides from a single collision‐induced dissociation spectrum collected by a tandem mass spectrometer
Kaur et al. Algorithms for automatic interpretation of high resolution mass spectra
Samuelsson et al. Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting
Moulédous et al. Navigated laser capture microdissection as an alternative to direct histological staining for proteomic analysis of brain samples
Wool et al. Precalibration of matrix‐assisted laser desorption/ionization‐time of flight spectra for peptide mass fingerprinting
Chevallet et al. Sweet silver: A formaldehyde‐free silver staining using aldoses as developing agents, with enhanced compatibility with mass spectrometry
Matthiesen et al. Database‐independent, database‐dependent, and extended interpretation of peptide mass spectra in VEMS V2. 0
Chassaigne et al. 2-Dimensional gel electrophoresis technique for yeast selenium-containing proteins—sample preparation and MS approaches for processing 2-D gel protein spots
WO2004006159A1 (en) Method and system for picking peaks for mass spectra
Johnson et al. Fourier‐transform mass spectrometry for automated fragmentation and identification of 5‐20 kDa proteins in mixtures
US20050221500A1 (en) Protein identification from protein product ion spectra
US9595427B2 (en) Acquisition of fragment ion mass spectra of biopolymers in mixtures
JP2008281411A (en) Protein database search method and recording medium
Müller et al. Visualization and analysis of molecular scanner peptide mass spectra
Breen et al. Automated Peak Harvesting of MALDI–MS spectra for high throughput proteomics
Avasarala et al. A distinctive molecular signature of multiple sclerosis derived from MALDI-TOF/MS and serum proteomic pattern analysis: detection of three biomarkers
CA2468689A1 (en) A system and method for automatic protein sequencing by mass spectrometry
Cottrell et al. The identification of electrophoretically separated proteins by peptide mass fingerprinting
Klarskov et al. Plasma desorption mass spectrometry of proteins transferred from gels after sodium dodecyl sulphate–polyacrylamide gel electrophoresis
US20020120404A1 (en) Methods and apparatus for mass fingerprinting of biomolecules

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP