[go: up one dir, main page]

WO2008138916A1 - Isolation of peptides and proteomics platform - Google Patents

Isolation of peptides and proteomics platform Download PDF

Info

Publication number
WO2008138916A1
WO2008138916A1 PCT/EP2008/055802 EP2008055802W WO2008138916A1 WO 2008138916 A1 WO2008138916 A1 WO 2008138916A1 EP 2008055802 W EP2008055802 W EP 2008055802W WO 2008138916 A1 WO2008138916 A1 WO 2008138916A1
Authority
WO
WIPO (PCT)
Prior art keywords
peptides
protein
mixture
proteins
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2008/055802
Other languages
French (fr)
Inventor
Joël VANDEKERCKHOVE
Kris Gevaert
Petra Van Damme
Koen Sandra
Mahan Moshir
Robin Tuytten
Bart Ruttens
Wouter Laroy
Katleen Verleysen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pronota NV
Original Assignee
Pronota NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pronota NV filed Critical Pronota NV
Publication of WO2008138916A1 publication Critical patent/WO2008138916A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6842Proteomic analysis of subsets of protein mixtures with reduced complexity, e.g. membrane proteins, phosphoproteins, organelle proteins
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • C07K1/12General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length by hydrolysis, i.e. solvolysis in general
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01DSEPARATION
    • B01D15/00Separating processes involving the treatment of liquids with solid sorbents; Apparatus therefor
    • B01D15/08Selective adsorption, e.g. chromatography
    • B01D15/10Selective adsorption, e.g. chromatography characterised by constructional or operational features
    • B01D15/18Selective adsorption, e.g. chromatography characterised by constructional or operational features relating to flow patterns
    • B01D15/1864Selective adsorption, e.g. chromatography characterised by constructional or operational features relating to flow patterns using two or more columns
    • B01D15/1871Selective adsorption, e.g. chromatography characterised by constructional or operational features relating to flow patterns using two or more columns placed in series
    • B01D15/1878Selective adsorption, e.g. chromatography characterised by constructional or operational features relating to flow patterns using two or more columns placed in series for multi-dimensional chromatography

Definitions

  • the invention relates to methods and apparatus for the enrichment and/or isolation of a subset of peptides out of complex mixtures of peptides.
  • the invention contemplates the enrichment and/or isolation of peptides which comprise the N-terminal ends or the C-terminal ends of proteins from which said complex mixtures of peptides are obtained.
  • the methods and apparatus of the invention are particularly applicable for qualitative and/or quantitative proteome analysis.
  • proteome is usually described as the entire complement of proteins found in a biological system, such as, e.g., a cell, tissue, organ or organism.
  • proteomics is concerned with the study of the proteome expressed at particular times and/or under internal or external conditions of interest. Proteomics approaches frequently aim at global analysis of the proteome, and require that large numbers of proteins, e.g., hundreds or thousands, can be routinely resolved and identified from a single sample.
  • Biomarker discovery usually involves comparing proteomes expressed in distinct physiological states, and identifying proteins whose occurrence or expression levels consistently differ between said physiological states.
  • proteolysis of complex biological samples can produce thousands of peptides, which may overwhelm the resolution capacity of known chromatographic and mass spectrometric systems, causing incomplete separation and impaired identification of the constituent peptides.
  • One manner to enable proteomic analysis of biological samples is to reduce the complexity of protein peptide mixtures generated by fragmentation of such samples, before subjecting said peptide mixtures to downstream resolving and identification steps, such as chromatographic separation and/or MS.
  • reducing the complexity of protein peptide mixtures will decrease the average number of distinct peptides present per individual protein of the sample, yet will maximise the fraction of proteins of the sample actually represented in the peptide mixture.
  • WO 02/077016 discloses a methodology ("COFRADIC") for qualitative and/or quantitative proteome analysis, wherein the complexity of the starting protein peptide mixture is reduced as follows: (a) the protein peptide mixture is separated into individual fractions of peptides using chromatography; (b) at least one amino acid of at least some of the peptides in each fraction is enzymatically and/or chemically altered, thus generating a subset of altered peptides and a subset of unaltered peptides for each fraction, and (c) the subset of altered peptides and the subset of unaltered peptides for each fraction are separated via chromatography and the particular subset of interest is isolated for further characterisation.
  • the chromatography of steps (a) and (c) is performed using the same type of chromatography, which allows comparison of the chromatographic properties of the altered peptides.
  • proteomic platforms which involve effective, robust and relatively simple (e.g., including a minimum of steps and optimally applied on a whole peptide digest) manners to decrease the complexity of peptide digests, coupled to appropriate steps for resolving and identification of constituent peptides, such as to facilitate comprehensive proteome analysis of complex samples.
  • the inventors contemplate that when a protein or a mixture of proteins, such as, e.g., proteins of a complex biological sample, are fragmented C-terminally adjacent to one or more specific amino acid residue types (herein generically denoted as amino acid residue types "X 1 ", "X 2 ",... "X ⁇ "), then majority of peptides comprising the C-terminal ends of the starting proteins will not include said one or more amino acid residue types X 1 , X 2 ,... X ⁇ , unless any of the residue types X 1 , X 2 ,... X ⁇ was the actual C-terminal residue of the respective protein. In contrast, essentially all peptides originating from the N-terminal ends or from the internal portions of the starting proteins will comprise one of said one or more amino acid residue types X 1 , X 2 ,... X n as their last residue.
  • amino acid residue types herein generically denoted as amino acid residue types "X 1 ", "X 2 ",... "
  • the invention takes advantage of this situation by reacting a peptide mixture obtained by said fragmentation with an agent capable of specifically modifying or removing said one or more amino acid residue types X 1 , X 2 ,... X ⁇ , followed by isolation of those peptides that have not been altered by said agent.
  • the so isolated, unaltered peptides are thus those that did not include any of said one or more amino acid residue types X 1 , X 2 ,... X ⁇ , and are therefore highly enriched for peptides comprising the C-terminal ends of the starting proteins.
  • the invention provides a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of: (a) fragmenting the protein (P) or the mixture of proteins (PM) preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types (X 1 , X 2 ,...
  • the modification of the amino acid residue types X 1 , X 2 ,... X ⁇ in, or their removal from, the peptides comprising said amino acid residue types X 1 , X 2 ,... X ⁇ will suitably change the properties of the so altered peptides to allow the subset (S) of unaltered peptides (i.e., primarily peptides comprising the C-termini of the starting proteins) to be distinguished and isolated from the altered peptides.
  • the modification of the amino acid residue types X 1 , X 2 ,... X ⁇ in, or their removal from, the peptides comprising said residue types X 1 , X 2 ,... X ⁇ will change the chromatographic behaviour of the so altered peptides, allowing to distinguish and isolate said subset (S) of unaltered peptides from the altered peptides by chromatography.
  • the inventors have recognised that the above method for isolating peptides comprising C-terminal ends of proteins may be very preferably used in conjunction with the overall method of WO 02/077016 A2, i.e., the step of altering the peptides comprising the amino acid residue types X 1 , X 2 ,... X ⁇ may be interposed between two chromatographic separations of the same type, wherein the peptide alteration step modifies the chromatographic behaviour of the altered peptides in the second chromatographic separation.
  • the invention provides a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of:
  • step (b) isolating a subset (S) of peptides from said protein peptide mixture (PPM), comprising the steps of: (ba) separating the protein peptide mixture (PPM) into fractions of peptides via chromatography, (bb) reacting at least one and preferably each peptide fraction from step (ba) with an agent capable of specifically modifying or removing said one or more amino acid residue types X 1 , X 2 ,...
  • the protein (P) or the mixture of proteins (PM) are fragmented preferentially at peptide bonds C-terminally adjacent to either one (X 1 ) or to two or more different amino acid residue types X 1 , X 2 ,... X ⁇ . To reduce the chance that the actual C-terminal residue of the analysed proteins is any of the residue types X 1 , X 2 ,...
  • suitable frequency of cleavage may be preferably achieved when the fragmentation takes place C-terminally adjacent to one or more of the 20 common amino acid residue types found in natural proteins, and/or to one or more of residues obtained from any of the 20 common amino acid residue types by suitable modification of the starting proteins (e.g., modification of lysine to homoarginine).
  • the protein (P) or the mixture of proteins (PM) are fragmented preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types X 1 , X 2 ,...
  • X 1 , X 2 ,... X ⁇ are chosen from the group consisting of: glycine (GIy, G), proline (Pro, P), alanine (Ala, A), valine (VaI, V), leucine (Leu, L), isoleucine (lie, I), methionine (Met, M), cysteine (Cys, C), phenylalanine (Phe, F), tyrosine (Tyr, Y), tryptophan (Trp, W), histidine (His, H), lysine (Lys, K), arginine (Arg, R), glutamine (GIn, Q), asparagine (Asn, N), glutamic acid (GIu, E), aspartic acid (Asp, D), serine (Ser, S) and threonine (Thr, T), or a residue obtained from any of the above by suitable modification.
  • glycine GIy, G
  • proline Pro, P
  • the present method may involve reacting a protein peptide mixture obtained by fragmentation of proteins C-terminally adjacent to one or more amino acid residue types X 1 , X 2 ,... X n , with an agent capable of specifically modifying said residue types X 1 , X 2 ,... X ⁇ .
  • the amino acid residue types X 1 , X 2 ,... X ⁇ may therefore be preferably selected from amino acid types whose side chains comprise comparably reactive moieties.
  • the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types X 1 , X 2 ,...
  • said one or more amino acid residue types X 1 , X 2 ,... X ⁇ comprise a moiety chosen from mercapto, methylthio, hydroxyphenyl, primary amino, secondary amino (including, inter alia, indyl, pyrrolidinyl and imidazyl, preferably indyl and imidazyl), guanidino, ureyl or carboxyl.
  • the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types X 1 , X 2 ,...
  • X 1 , X 2 ,... X ⁇ are chosen from: the group consisting of Met, Cys, Tyr, Trp, His, Pro, Lys, Arg, hArg, GIu and Asp; or the group consisting of Met, Cys, Tyr, Trp, His, Lys, Arg, hArg, GIu and Asp; or the group consisting of His, Lys and Arg; or the group consisting of Lys and Arg; or the group consisting of Met and Cys; or the group consisting of Tyr and Trp; or the group consisting of Asp and GIu.
  • fragmenting of the protein (P) or the mixture of proteins (PM) may be effected enzymatically, preferably by an endoproteinase, more preferably by trypsin.
  • the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to amino acid residue(s) that comprise a guanidino moiety, such as, e.g., arginine and/or homoarginine, and/or C-terminally adjacent to lysine, wherein the lysine may be advantageously converted to homoarginine subsequent to the fragmentation.
  • a guanidino moiety such as, e.g., arginine and/or homoarginine, and/or C-terminally adjacent to lysine
  • peptides comprising a guanidino moiety will thus mainly be derived from the N-termini or internal portions of the starting protein(s), whereas peptides lacking a guanidino moiety will mostly originate from and comprise the C-terminal ends of the starting protein(s).
  • Homoarginine (hArg) may be preferably introduced to proteins before the fragmentation by a suitable modification of Lys. Lys may be preferably converted to hArg after the fragmentation.
  • Suitable modification of the guanidino moiety can discriminate those peptides of the protein (P) or protein peptide mixture (PPM) that comprise a guanidino moiety (altered by said modification) from those that do not (unaltered by said modification), and thereby allow to isolate the latter, mainly C-terminal, peptides.
  • the invention contemplates advantageous manners to modify peptides that include a guanidino moiety, such as, e.g., peptides with Arg or hArg, more preferably with Arg.
  • peptides comprising a guanidino moiety may be modified by reacting with an agent chosen from a dicarbonyl compound or derivative thereof (such as, e.g., preferably with an arylglyoxal, more preferably phenylglyoxal or hydroxyphenylglyoxal, or also very preferably with nitromalondialdehyde), a peptidylarginine deiminase or an arginase.
  • an agent chosen from a dicarbonyl compound or derivative thereof such as, e.g., preferably with an arylglyoxal, more preferably phenylglyoxal or hydroxyphenylglyoxal, or also very preferably with nitromalondialdehyde
  • the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to a basic amino acid residue, more preferably C-terminally adjacent to Arg and/or hArg and/or Lys.
  • PPM protein peptide mixture
  • Homoarginine (hArg) may be preferably introduced to proteins before the fragmentation by a suitable modification of Lys.
  • Removal of basic last residue can discriminate those peptides of the protein (P) or protein peptide mixture (PPM) that contained such basic last residue from those that did not, and thereby allow to isolate the latter, mainly C-terminal, peptides.
  • the invention contemplates advantageous manners to remove basic last residues, preferably Arg, hArg or Lys, from peptides, e.g., preferably, but without limitation, using carboxypeptidase B.
  • An added advantage of fragmenting proteins C-terminally adjacent to Arg and/or hArg and/or Lys, preferably to Arg and/or Lys, as above, is that such cleavage may be achieved using trypsin, which - due to its high specificity and efficiency of proteolysis - is a particularly preferred endoproteinase for proteomics applications (trypsin cleaves preferentially C- terminally adjacent to Arg and Lys and, to a lesser extent, after hArg).
  • embodiments which involve fragmenting of the protein (P) or the protein mixture (PM) preferentially C-terminally adjacent to a basic residue may benefit from an additional step to enrich for C- terminal peptides from the protein (P) or protein mixture (PM).
  • said the additional step may be performed after the protein (P) or the protein mixture (PM) has been suitably fragmented and before the isolation of C-terminal peptides as disclosed in step (b) above.
  • PPM protein peptide mixture
  • PPM protein peptide mixture
  • the peptides comprising C-terminal ends of the respective proteins will in general have about zero net charge, whereas N-terminal and internal peptides will in general display net charge of about +1 or higher (departure from this situation may occur for peptides which, under said conditions, would contain additional charged side chain groups, such as, e.g., carboxyl, phosphate or sulphonate, or, where proteolysis does not take place after each basic residue (e.g., not after histidine, e.g., when trypsin is used), a charged basic group) (departure from the above general situation may also occur for N-terminal peptides derived from naturally N-terminally acetylated proteins).
  • additional charged side chain groups such as, e.g., carboxyl, phosphate or sulphonate, or, where proteolysis does not take place after each basic residue (e.g., not after histidine, e.g., when trypsin is used),
  • This difference between the net charge of the peptides comprising the C-terminal ends of proteins and the remaining peptides of the protein peptide mixture (PPM) allows for separating said subsets of peptides and for isolating or at least enriching for the subset (S') of C-terminal peptides.
  • the invention provides a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of: (i) fragmenting the protein (P) or the mixture of proteins (PM) preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types (X 1 , X 2 ,... X ⁇ ) to obtain the protein peptide mixture (PPM), wherein said one or more amino acid residue types X 1 , X 2 ,...
  • Step (iii) may preferably comprise steps (ba), (bb) and (be) as taught above. It shall be appreciated that the order of steps (ii) and (Ni) can be reversed in the above method.
  • step (ii) a workable method for isolating or enriching C-terminal peptides can already be obtained comprising steps (i) and (ii) and not step (iii).
  • step (ii) addition of step (ii) to the methods of the invention disclosed herein can provide for a significant, synergic, boost in the number of C-terminal peptides that can be identified with any of said method alone.
  • step (ii) the conditions of step (ii) are preferably such that the acidic side chain moiety (in particular the -COOH side chain moiety) of the majority of said acidic amino acids, is not dissociated.
  • the pH may be preferably between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.25, and most preferably about 3.
  • said subsets of peptides may be resolved and the subset (S') isolated or enriched by cation exchange chromatography, preferably by strong cation exchange ("SCX”) chromatography.
  • the charge of at least some or all of the basic amino acid types after which the proteolysis does not occur may be advantageously neutralised by suitable modification (e.g., acetylation for lysine, etc.).
  • suitable modification e.g., acetylation for lysine, etc.
  • this will thwart the effect that the presence of such basic amino acid types in peptides derived from the C-terminal ends of the proteins would have on the charge of said peptides, and thereby ensure an about zero net charge for a greater proportion of the C-terminal peptides and improve their enrichment in step (ii).
  • the N-terminal -NH 2 groups of the protein (P) or the mixture of proteins (PM) may be blocked before the fragmentation, such that the so-blocked groups are not protonated under the conditions of said additional enrichment step (a considerable proportion of ⁇ -NH 2 groups of proteins may already be acetylated in nature). Consequently, peptides comprising the N-terminal ends of the protein (P) or mixture of proteins (PM) will also generally display zero net charge under said conditions and may be isolated or enriched for alongside the peptides comprising the C-terminal ends in step (ii).
  • the invention generally provides a method for protein identification and optionally quantification from a protein mixture comprising the steps: (a) fragmenting a mixture of proteins (PM) to obtain a protein peptide mixture (PPM); (b) isolating from the protein peptide mixture PPM:
  • peptides comprising the C-terminal ends of proteins of the mixture of proteins PM (i.e., C-terminal peptides); (c) separating the isolated N-terminal and/or C-terminal peptides into fractions of peptides via a multidimensional separation process or via one-dimensional long-column chromatography; and
  • N-terminal peptides are isolated and analysed.
  • C- terminal peptides are isolated and analysed.
  • N-terminal and C- terminal peptides are isolated and analysed.
  • the inventors have realised advantageous manners that allow for robust and straightforward sorting of N-terminal and/or C-terminal peptides, even from relatively complex peptide mixtures.
  • the inventors have achieved conditions that allow for satisfactory resolution of so-isolated N-terminal and/or C-terminal peptides - even while said peptides can represent rather complex parent protein samples - so as to facilitate identification and optionally quantification of the constituent N-terminal and/or C-terminal peptides.
  • suitable resolution conditions may involve one-dimensional long-column chromatography, which can achieve adequate peptide resolution due to the increased column length.
  • a multidimensional separation process such as preferably but without limitation orthogonal 2D-chromatography, can also achieve satisfactory separation of the N-terminal and/or C-terminal peptides isolated herein. Accordingly, the methods and systems described herein advantageously allow for comprehensive proteomic analysis of considerably complex protein mixtures, e.g., protein mixtures obtained from relevant biological samples.
  • the invention contemplates ways in which N-terminal and/or C-terminal peptides can be isolated in step (b) from protein peptide mixtures (PPM) obtained by fragmentation of the starting protein mixtures (PM) using trypsin, which tends to be favoured in proteomics applications due to its high specificity and efficiency of proteolysis.
  • PPM protein peptide mixtures
  • the mixture of proteins (PM) is fragmented using trypsin or trypsin-like protease to obtain the protein peptide mixture (PPM).
  • N-terminal and/or C-terminal peptides can be isolated or enriched herein from tryptic digests on the basis of a difference in net charge between the majority of or substantially all N-terminal and/or C-terminal peptides compared to the majority of or substantially all remaining peptides.
  • trypsin cleaves proteins C-terminally adjacent to Arg and Lys residues (except where the ensuing residue is Pro). Consequently, trypsin cleavage generates a protein peptide mixture (PPM) wherein the majority of or substantially all C-terminal peptides do not contain Arg or Lys (unless Arg or Lys was the last C-terminal residue of the corresponding protein), whereas the majority of or substantially all N-terminal and internal peptides do contain Arg or Lys as their last residue.
  • PPM protein peptide mixture
  • N-terminal and internal peptides Under conditions where substantially all Arg and Lys side chains are protonated (preferably under acidic conditions, more preferably at pH about 4.0 or less, even more preferably pH about 3.0 or less), N-terminal and internal peptides will thus in general carry an extra positive charge compared to C-terminal peptides. More in particular, the majority of or substantially all C-terminal peptides will display about zero net charge, whereas the majority of or substantially all N-terminal and internal peptides will show about +1 net charge.
  • C-terminal peptides contain Lys or Arg, e.g., as the last residue or as Lys-Pro or Arg-Pro; where peptides contain amino acids whose side chains may be charged under the above conditions, such as, e.g., His, Asp or GIu; where peptides contain other moieties that may be charged under the above conditions, such as, e.g., phosphate or sulphonate; where N-terminal peptides originate from naturally ⁇ -NH 2 acetylated proteins and hence lack a protonated ⁇ - NH 2 group under the above conditions; or where some N-terminal or internal peptides are produced by non-specific cleavage (non-tryptic peptides) and do not contain Arg or Lys.
  • C-terminal peptides may be isolated or enriched from a protein mixture (PM) using steps comprising: proteolysing a mixture of proteins (PM) by trypsin or trypsin-like protease to obtain a protein peptide mixture (PPM); isolating, under conditions where substantially all Arg and Lys side chains are protonated, a subset of peptides from the protein peptide mixture (PPM), wherein the peptides of said subset have about zero net charge under said conditions.
  • steps comprising: proteolysing a mixture of proteins (PM) by trypsin or trypsin-like protease to obtain a protein peptide mixture (PPM); isolating, under conditions where substantially all Arg and Lys side chains are protonated, a subset of peptides from the protein peptide mixture (PPM), wherein the peptides of said subset have about zero net charge under said conditions.
  • said conditions may encompass acidic conditions, more preferably pH about 4.0 or less, even more preferably pH about 3.0 or less, such as, e.g., pH between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.5.
  • acidic conditions more preferably pH about 4.0 or less, even more preferably pH about 3.0 or less, such as, e.g., pH between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.5.
  • Such relatively low pH may advantageously also ensure that the side chain -COOH groups of Asp and GIu do not dissociate and thus do not influence the peptide net charge.
  • the N-terminal ⁇ -NH 2 groups of proteins in the protein mixture may be blocked (e.g., acylated, preferably acetylated) to prevent their protonation under acidic conditions. Consequently, following tryptic digest of so-blocked protein mixture (PM), not only C-terminal peptides, but also the majority of or substantially all N-terminal peptides will display about zero net charge under conditions where the side chains of substantially all Arg and Lys (if not blocked) are protonated.
  • N-terminal and C-terminal peptides may be isolated or enriched from a protein mixture (PM) using steps comprising: - blocking ⁇ -NH 2 groups of proteins in a mixture of proteins (PM) to prevent their protonation under acidic conditions; proteolysing the protein mixture (PM) by trypsin or trypsin-like protease to obtain a protein peptide mixture (PPM); isolating, under conditions where substantially all Arg and Lys (if not blocked) side chains are protonated, a subset of peptides from the protein peptide mixture (PPM), wherein the peptides of said subset have about zero net charge under said conditions.
  • steps comprising: - blocking ⁇ -NH 2 groups of proteins in a mixture of proteins (PM) to prevent their protonation under acidic conditions; proteolysing the protein mixture (PM) by trypsin or trypsin-like protease to obtain a protein peptide mixture (PPM); isolating, under conditions
  • said conditions may encompass acidic conditions, more preferably pH about 4.0 or less, even more preferably pH about 3.0 or less, such as, e.g., pH between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.5.
  • the modification of -NH 2 groups in the protein mixture (PM) may but need not also modify side chain primary amino groups, particularly the ⁇ -NH 2 groups of Lys. If ⁇ -NH 2 groups of Lys are modified, this can advantageously allow isolation of C-terminal peptides containing Lys.
  • the subset of peptides displaying about zero net charge in the above embodiments can be isolated from peptides of other net charge, particularly peptides showing about +1 net charge, using any method capable of distinguishing analytes on the basis of net charge difference.
  • said method may be ion exchange chromatography (IEC), preferably cation exchange chromatography (CEC), more preferably strong cation exchange (SCX) chromatography.
  • IEC ion exchange chromatography
  • CEC cation exchange chromatography
  • SCX strong cation exchange
  • the peptide subset having about zero net charge will elute faster and/or under less stringent conditions (e.g., lower ionic strength) than the peptides having about +1 net charge, thereby allowing separation of the subsets.
  • the separation can be performed on the entire protein peptide mixture (PPM), in a single step, and may recover the peptides of interest in a single fraction, which greatly increases the capacity, easiness of handling and throughout of the present proteomics methods.
  • the CEC such as SCX chromatography is columnar.
  • the eluate containing the peptides of interest may be collected and optionally further manipulated before subjecting it to further steps of the present methods.
  • the eluate containing the peptides of interest may be directly (on-line) fed to a system performing the ensuing separation steps of the present method.
  • a further separation step may be inserted to isolate or enrich from said mixture the subset of N- terminal peptides or the subset of C-terminal peptides.
  • the C-terminal peptides may be selectively captured by a capturing agent specific for primary amino acid groups, such as without limitation a crown ether capture agent (e.g., 18-crown-6).
  • a capturing agent specific for primary amino acid groups such as without limitation a crown ether capture agent (e.g., 18-crown-6).
  • C-terminal peptides may be distinguished from blocked N-terminal peptides based on the difference between the basicity of free ⁇ -NH 2 groups (present in C- terminal peptides but blocked in N-terminal peptides) vs. the basicity of Arg and Lys side chains (present in N-terminal peptides but absent in most C-terminal peptides).
  • the invention further contemplates embodiments in which the protein mixture (PM) and/or the protein peptide mixture (PPM) are modified such as to introduce differently charged moieties selectively on the N-terminal, C-terminal and/or internal peptides.
  • the so-generated net charge differences between the peptides allow to isolate the desired peptides from the protein peptide mixture (PPM).
  • one or more positive charges may be introduced to N-terminal peptides while one and preferably two or more negative charges may be introduced to internal and C-terminal peptides.
  • the N-terminal peptides may then be bound onto a CEC, preferably SCX column, while the remaining peptides would be found in the eluate.
  • N-terminal and/or C-terminal peptides isolated or enriched for as herein may be subjected to a multidimensional separation process.
  • a sample of analytes is subjected to a sequence of two or more separation steps ("dimensions"), each of which acts upon all or a part of analytes separated in a previous separation step, wherein any two analytes resolved in a given separation step remain resolved in subsequent separation steps, and wherein the distinct separation steps resolve analytes on the basis of different physical and/or chemical properties.
  • any or all fractions from a given separation step may be each individually resolved in a subsequent separation step.
  • the conditions in said steps are preferably orthogonal, such that peptides not resolved (i.e., recovered in same fraction) in one step will be resolved in a further step.
  • the present multidimensional separation process may involve 4 separation steps or less, preferably 3 separation steps or less. More preferably, the separation process is two- dimensional (2D). In an embodiment, the stages of the separation process may be coupled in an on-line system.
  • one or more or all separation steps of the multidimensional separation process may be by chromatography.
  • all separation steps may be by chromatography (multidimensional chromatography).
  • the separation process may be 4D-, preferably 3D-, more preferably 2D-chromatography.
  • Chromatographic step(s) of the multidimensional separation process may involve suitable stationary phases, mobile phases (e.g., linear or gradient) and elution conditions known perse.
  • the physical and/or chemical properties based on which peptides can be resolved in the distinct steps of the multidimensional separation process may be chosen from inter alia net charge, electrophoretic mobility (EPM), isoelectric point (pi), molecular size and/or ability or tendency to form certain type(s) of molecular interactions, such as, e.g., dispersive (hydrophobic) interactions, dipole-dipole polar interactions (e.g., hydrogen bonding), dipole-induced dipole polar interactions (e.g., ⁇ - ⁇ interactions) or ionic interactions.
  • EPM electrophoretic mobility
  • pi isoelectric point
  • molecular size and/or ability or tendency to form certain type(s) of molecular interactions such as, e.g., dispersive (hydrophobic) interactions, dipole-dipole polar interactions (e.g., hydrogen bonding), dipole-induced dipole polar interactions (e.g., ⁇ - ⁇ interactions) or ionic interactions.
  • Said properties may be evaluated using a variety of separation techniques known per se in the art, and which may constitute separation steps of the multidimensional process.
  • numerous chromatographic and electrophoretic applications exist to resolve peptides on the basis of the above described properties including inter alia reversed phase high performance liquid chromatography (RP-HPLC), hydrophobic interaction chromatography (HIC), normal-phase HPLC (NP-HPLC), hydrophilic interaction liquid chromatography (HILIC), chromatofocusing, size exclusion chromatography (SEC), ion exchange chromatography (IEC), affinity chromatography (AC), capillary gel electrophoresis (CGE), capillary zone electrophoresis (CZE), micellar electrokinetic chromatography (MEKC), free flow electrophoresis (FFE), isoelectric focusing (IEF) including capillary isoelectric focusing (CIEF), capillary isotachophoresis (CITP), capillary electrochromatography (CEC), and the like.
  • RP-HPLC reversed phase high
  • one or more chromatographic separation steps may involve reversed phase (RP) chromatography, preferably RP liquid chromatography, more preferably RP- HPLC.
  • RP chromatography may include appropriate solid supports (e.g., porous or non-porous silica) functionalised with moieties such as but not limited to: aliphatic hydrocarbon moieties, e.g., straight, branched and/or alicyclic, saturated or unsaturated aliphatic hydrocarbon moieties of between 2 and 30 carbon atoms, preferably including straight or branched, more preferably straight, alkyl moieties, more preferably alkyl moieties having between 2 and 30 carbons, such as, e.g., about 18 (octadecyl), about 8 (octyl), 4 (butyl), 3 (propyl) or 2 (ethyl) carbon atoms; aromatic moieties, such as aryl, arylalkyl, aryl
  • one or more electron-withdrawing substituents such as, e.g., -COR, nitro (-NO 2 ), fluorine (-F) or ammonium (- + NR 3 , - + NHR 2 , - + NH 2 R) groups, wherein R is an alkyl; preferred examples include inter alia trinitrophenyl or pentafluorophenyl moiety; or
  • one or more electron-donating moieties such as, e.g., hydroxyl (-OH), alkyloxy such as methoxy (-OMe) or amino (-NR 2 , -NHR) groups where R is an alkyl; preferred examples include inter alia phenyl, diphenyl, p-methoxyphenyl and 4- N,N-dimethylaminophenyl moieties;
  • Aromatic moieties as such or substituted can potentially add other type of interactions apart from hydrophobic interactions, such as inter alia ⁇ - ⁇ interactions).
  • one or more chromatographic separation steps may involve hydrophilic interaction chromatography (HILIC), such as disclosed by Alpert AJ 1990 (J Chromatogr 499: 177-96) and later developments thereof.
  • HILIC hydrophilic interaction chromatography
  • Exemplary neutral polar stationary phases for HILIC chromatography may include appropriate solid supports (e.g., porous or non-porous silica) functionalised with moieties such as but not limited to: polar moieties such as, e.g., amino, diol, amide, or polyhydroxyethyl aspartamide; or zwitterionic groups (ZIC-HILIC), such as, e.g., -N + (CHa) 2 CH 2 CH 2 CH 2 SO 3 " Un-derivatised silica is also a commonly used stationary phase for HILIC.
  • appropriate solid supports e.g., porous or non-porous silica
  • moieties such as, e.g., amino, diol, amide, or polyhydroxyethyl aspartamide
  • ZIC-HILIC zwitterionic groups
  • Un-derivatised silica is also a commonly used stationary phase for HILIC.
  • the multidimensional separation process comprises or consists of two-dimensional chromatography, wherein one dimension (1 st or 2 nd ) is RP-HPLC operated at low pH (particularly, pH less than about 5, preferably less than about 4, more preferably less than about 3, such as pH between about 0.5 and about 3 or pH between about 1 and 2.5, yet more preferably pH about 2); and the other dimension (2 nd or 1 st ) is RP-HPLC operated at high pH (particularly, pH more than about 8, preferably more than about 9, such as pH between about 9 and about 12 or pH between about 9 and 11 , yet more preferably pH about 10).
  • orthogonality between RP and RP chromatography may be achieved on the basis of pH difference, and often regardless of the RP-functional moiety, since high- and low- pH RP have proven to be orthogonal (see Figure 5).
  • the multidimensional separation process comprises or consists of two-dimensional chromatography, wherein one dimension (1 st or 2 nd ) is RP-HPLC using stationary phase functionalised with a C18 moiety operated at low pH; and the other dimension (2 nd or 1 st ) is RP-HPLC using stationary phase functionalised with a C18 or phenyl, etc. moiety operated at high pH.
  • C18 column separation at low pH can be employed as the second (or ultimate) dimension, since it can be made highly compatible with a downstream MS analysis.
  • the multidimensional separation process comprises or consists of two-dimensional chromatography, wherein one dimension (1 st or 2 nd ) is chromatography, preferably RP-HPLC, using stationary phase functionalised with a C18 moiety and the other dimension (2 nd or 1 st ) is HILIC chromatography, preferably ZIC-HILIC.
  • chromatography preferably RP-HPLC
  • stationary phase functionalised with a C18 moiety
  • HILIC chromatography preferably ZIC-HILIC.
  • multidimensional separation processes of interest for the invention may comprise or consist of an electrophoretic separation step (e.g., FFE, CIEF, CZE) preferably as a 1 st dimension, in conjunction with chromatographic separation such as RP-HPLC or HILIC.
  • electrophoretic separation step e.g., FFE, CIEF, CZE
  • chromatographic separation such as RP-HPLC or HILIC.
  • N-terminal and/or C-terminal peptides isolated or enriched for as herein may be subjected to a one-dimensional (1 D) long-column chromatography separation. While this separation type involves a single dimension, the use of long columns in conjunction with the significantly reduced complexity of the peptide mixture (i.e., enriched for N- and/or C- terminal peptides) allows to achieve satisfactory resolution of the constituent peptides.
  • separation modes suitable for the 1 D long-column chromatography include any chromatography types described above and elsewhere in this specification, such as preferably but without limitation, reversed phase high performance liquid chromatography (RP-HPLC), hydrophobic interaction chromatography (HIC), normal- phase HPLC (NP-HPLC) or hydrophilic interaction liquid chromatography (HILIC).
  • RP-HPLC reversed phase high performance liquid chromatography
  • HIC hydrophobic interaction chromatography
  • NP-HPLC normal- phase HPLC
  • HILIC hydrophilic interaction liquid chromatography
  • long-column chromatography refers to columnar chromatography, preferably employing liquid mobile phase, more preferably HPLC, using a stationary phase column having length of at least 75 cm, more preferably at least 1 metre, e.g., at least 1.5 m, even more preferably at least 2 m, e.g., at least 2.5 m, and most preferably up to 3 m or even more.
  • chromatography particularly employing liquid mobile phase, more particularly HPLC
  • stationary phase columns having lengths common for peptide separations, such as between about 3 cm and about 50 cm, more preferably between about 5 cm and about 30 cm, even more preferably between about 10 cm and 25 cm. It shall be however appreciated that long columns may be in principle also applicable to the present multidimensional separations.
  • the invention is also directed to a device or system that is able to carry out the methods of the invention, in particular the methods as above comprising the isolation of N-terminal and/or C-terminal peptides, followed by multidimensional separation thereof, and optionally identification of peptides there from.
  • the invention relates to a system for sorting peptides comprising: a first chromatographic column for isolating N- terminal and/or C-terminal peptides from the protein peptide mixture (PPM), and two or more downstream chromatographic columns for separating the N-terminal and/or C-terminal peptides into a plurality of fraction in a multidimensional separation process as described herein.
  • the invention also relates to a system for sorting peptides comprising: a first chromatographic column for isolating N-terminal and/or C-terminal peptides from the protein peptide mixture (PPM), and a downstream long chromatographic column for separating the N-terminal and/or C-terminal peptides into a plurality of fraction in a 1 D long-column chromatography separation.
  • the first chromatographic column may be ion exchange column, more preferably a cation exchange column, even more preferably SCX column.
  • the system may be configured to perform any two or more or all above peptide sorting and separation steps "in-line", i.e., by directly feeding desired analytes from a previous separation element to the subsequent separation element.
  • Figure 1 illustrates a flow-chart of a particular method for isolating peptides comprising C- terminal ends of proteins from a protein peptide mixture.
  • Figure 2 depicts the stoichiometry of the reaction of p-hydroxyphenylglyoxal with the guanidino group of an arginine residue in a peptide chain (R, R in this figure depict the remaining portions of the peptide chain).
  • Figure 3 depicts the stoichiometry of the reaction of nitromalondialdehyde (NMA) with the guanidino group of an arginine residue in a peptide chain (R, R in this figure depict the remaining portions of the peptide chain).
  • NMA nitromalondialdehyde
  • Figure 4 represents results obtained in an experiment for isolating C-terminal peptides.
  • Figure 5 illustrates orthogonality of RP separations performed at different pH. Both separations involve the combination of phenyl (1 st dimension) and C18-RPLC (2 nd dimension) chromatography. Vertical axis: 1 st dimension; horizontal axis: 2 nd dimension. In (A) both dimensions were operated at low pH; in (B) the phenyl column was operated at high pH and the C18 column at low pH, resulting in improved orthogonality.
  • Figure 6 illustrates 2D orthogonal separation of SCX-sorted N- and C-terminal peptides as described in the examples. The analysis of 24 first-dimension fractions is presented. Vertical axis: 1 st dimension; horizontal axis: 2 nd dimension.
  • Figure 7 substantiates reproducibility of the peptide sorting and identification approach described in the examples. A triplicate experiment was performed using the same sample treated in parallel prior to depletion. 89% of the quantifiable features are present in at least 2 of the 3 samples.
  • Figure 8 illustrates orthogonality provided by ZIC-HILIC (1st dimension) and RPLC (second dimension). Upper panel: 1 st dimension, Lower panel: 2 nd dimension of fractions indicated in upper panel.
  • protein refers to naturally or recombinantly produced macromolecules comprising one or more polypeptide chains, i.e., polymeric chains of amino acid residues linked by peptide bonds.
  • the term thus encompasses monomeric proteins, as well as protein dimers (hetero- as well as homo-dimers) and protein multimers (hetero- as well as homo-multimers).
  • the term also encompasses proteins that carry one or more co- or post-expression modifications of the polypeptide chain(s), such as, e.g., glycosylation, acetylation, phosphorylation, sulfonation, methylation, ubiquitination, signal peptide removal, N-terminal Met removal, conversion of pro-enzymes or pre-hormones into active forms, etc.
  • proteins that carry one or more co- or post-expression modifications of the polypeptide chain(s) such as, e.g., glycosylation, acetylation, phosphorylation, sulfonation, methylation, ubiquitination, signal peptide removal, N-terminal Met removal, conversion of pro-enzymes or pre-hormones into active forms, etc.
  • the term includes nascent protein chains as well as partly or wholly folded proteins, misfolded proteins, partly or wholly unfolded or denatured proteins, and may also cover coalesced or aggregated proteins,
  • the term further also includes protein variants or mutants which carry amino acid sequence variations vis-a-vis a corresponding native protein, such as, e.g., amino acid deletions, additions and/or substitutions.
  • the term contemplates both full-length proteins and protein parts, preferably naturally-occurring protein parts, that ensue from further processing of said full-length proteins.
  • the methods of the invention are suitable for analysing individual proteins (such as, e.g., proteins isolated using SDS-PAGE or 2D-electrophoresis, etc.) as well as mixtures of proteins, including complex mixtures.
  • the methods of the invention may be preferably suited for analysing proteins or mixtures of proteins, wherein average and/or median length of the polypeptide chain(s) is at least about 20 amino acids, preferably at least about 50 amino acids, more preferably at least about 100 amino acids, a more preferably at least about 200 amino acids, or even at least about 500 amino acids or more.
  • the invention is particularly suitable for analysing mixtures of proteins, including complex protein mixtures.
  • mixture of proteins or “protein mixture” generally refer to a mixture of two or more different proteins, e.g., a composition comprising said two or more different proteins.
  • a mixture of proteins to be analysed herein may include more than about 10, preferably more than about 50, even more preferably more than about 100, yet more preferably more than about 500 different proteins, such as, e.g., more than about 1000 or more than about 5000 different proteins, or preferably even more than about 10,000 or 20,000 or 30,000 or more different proteins.
  • An exemplary complex protein mixture may involve, without limitation, all or a fraction of proteins present in a biological sample or part thereof.
  • peptide or protein peptide as used herein generally refer to fragments of a protein derived by fragmentation of said protein or of any one or more of its polypeptide chains, into two or more fragments. While the terms encompass peptides of any sizes and molecular weights, peptides and protein peptide mixtures preferred in the invention may have average and/or median length of less than about 200 amino acids, e.g., less than about 150 amino acids, preferably less than about 100 amino acids, e.g., less than about 90 amino acids, less than about 80 amino acids, less than about 70 amino acids or less than about 60 amino acids, and even more preferably less than about 50 amino acids, e.g., less than about 40 amino acids or less than about 30 amino acids.
  • peptides and protein peptide mixtures preferred in invention may have average and/or median length of at least about 5 amino acids, preferably at least about 10 amino acids, even more preferably at least about 15 amino acids, e.g., at least about 20 amino acids.
  • peptides and protein peptide mixtures preferred in the invention may have average and/or median length of between about 5 and about 200 amino acids, preferably between about 5 and about 100 amino acids, also preferably between about 10 and about 100 amino acids, even more preferably between about 10 and about 50 amino acids, e.g., between about 10 and about 40 amino acids or between about 10 and about 30 amino acids.
  • Such peptide sizes are particularly amenable to analysis using the methods of invention.
  • peptide mixture or “mixture of peptides” generally refer to a mixture of two or more different peptides, e.g., a composition comprising said two or more different peptides.
  • protein peptide mixture generally refers to a mixture of peptides derived from a protein or from a mixture of two or more different proteins (i.e., protein mixture).
  • protein peptide mixture may also encompass peptide mixtures that include only a portion of all peptides obtained by fragmentation of a protein or a mixture of proteins, e.g., by fragmentation of all or a part of proteins present in a biological sample.
  • said portion of peptides may be selected from said all peptides on the basis of one or more selection criteria of interest, such as, without limitation, molecular weight, net charge, hydrophilicity and/or hydrophobicity of the constituent peptides, before being subjected to the methods of the invention.
  • a protein peptide mixture may be derived from a complex mixture of proteins, such as, e.g., from all or a fraction of proteins present in a biological sample or part thereof.
  • a protein peptide mixture may be thus obtained by fragmentation of all or a fraction of proteins present in and/or isolated from a biological sample after the sample has been obtained or removed from biological source.
  • the proteins may be fragmented so as to yield protein peptide mixtures having preferred average or median chain lengths as detailed above. It can be expected that, depending on the number of different proteins subjected to the fragmentation, their average or median size and the incidence of fragmentation thereof, the resulting protein peptide mixtures may comprise easily up to 1.000, 5.000, 10.000, 20.000, 30.000, 50.000, 100.000, 200.000, 300.000 or more different peptides.
  • the protein peptide mixture can also originate directly from a biological sample.
  • urine comprises, besides proteins, a very complex peptide mixture resulting from proteolytic degradation of proteins in the body and elimination of the resulting peptides via the kidneys.
  • the method may employ protein peptide mixtures obtained from biological samples without further fragmentation in vitro.
  • samples may be obtained from: viruses, e.g., viruses of prokaryotic or eukaryotic hosts; prokaryotic cells, e.g., bacteria or archea, e.g., free-living or planktonic prokaryotes or colonies or bio-films comprising prokaryotes; eukaryotic cells or organelles thereof, including eukaryotic cells obtained from in vivo or in situ or cultured in vitro; eukaryotic tissues or organisms, e.g., cell-containing or cell-free samples from eukaryotic tissues or organisms; eukaryotes may comprise protists, e.g., protozoa or algae, fungi, e.g., yeasts or molds, plants and animals, e.g., mammals, humans or non
  • Biological sample may thus encompass, for instance, a cell, tissue, organism, or extracts thereof.
  • a biological sample may be preferably removed from its biological source, e.g., from an animal such as mammal, human or non-human mammal, by suitable methods, such as, without limitation, collection or drawing of urine, saliva, sputum, semen, milk, mucus, sweat, faeces, etc., drawing of blood, cerebrospinal fluid, interstitial fluid, optic fluid (vitrius) or synovial fluid, or by tissue biopsy, resection, etc.
  • a biological sample may be further subdivided to isolate or enrich for parts thereof to be used for obtaining proteins for analysing using the methods of the invention.
  • tissue types may be separated from each other; specific cell types or cell phenotypes may be isolated from a sample, e.g., using FACS sorting, antibody panning, laser-capture dissection, etc.; cells may be separated from interstitial fluid, e.g., blood cells may be separated from blood plasma or serum; or the like.
  • the sample can be applied to the method directly or can be processed, extracted or purified to varying degrees before being used.
  • the sample can be derived from a healthy subject or a subject suffering from a condition, disorder, disease or infection.
  • the subject may be a healthy animal, e.g., human or non-human mammal, or an animal, e.g., human or non-human mammal, who has cancer, an inflammatory disease, autoimmune disease, metabolic disease, CNS disease, ocular disease, cardiac disease, pulmonary disease, hepatic disease, gastrointestinal disease, neurodegenerative disease, genetic disease, infectious disease or viral infection, or other ailment(s).
  • a healthy animal e.g., human or non-human mammal
  • an animal e.g., human or non-human mammal
  • who has cancer an inflammatory disease, autoimmune disease, metabolic disease, CNS disease, ocular disease, cardiac disease, pulmonary disease, hepatic disease, gastrointestinal disease, neurodegenerative disease, genetic disease, infectious disease or viral infection, or other ailment(s).
  • protein mixtures derived from biological samples may be treated to deplete highly abundant proteins there from, in order to increase the sensitivity and performance of proteomics analyses.
  • mammalian such as human serum or plasma samples may include abundant proteins, inter alia albumin, IgG, antitrypsin, IgA, transferrin, haptoglobin and fibrinogen, which may preferably be so-depleted from the samples.
  • Methods and systems for removal of abundant proteins are known, such as, e.g., immuno-affinity depletion, and frequently commercially available, e.g., Multiple Affinity Removal System (e.g.,
  • MARS-7, MARS-14 from Agilent Technologies (Santa Clara, California).
  • subset of peptides out of a protein peptide mixture denotes a fraction of the total set of peptides present in the protein peptide mixture.
  • Such a fraction can preferably amount to 50% or less of the total set of peptides in the protein peptide mixture, e.g., to less than 40% or less than 30% of said total set of peptides, more preferably to less than 20% of said total set of peptides, e.g., less than 15% of said total set of peptides, and even more preferably to less than 10% of said total set of peptides, such as, e.g., to less than 8%, less than 6%, less than 4%, less than 2%, less than 1 %, less than 0.1 % or even less than 0.01 % of said total set of peptides in the protein peptide mixture.
  • this reduced size of the subset of peptides vis-a-vis the total set of peptides in the protein peptide mixture allows
  • isolated or enriching a subset of peptides out of a protein peptide mixture generally means setting apart or separating said subset of peptides from the remaining peptides of the protein peptide mixture, such that said subset of peptides can be identified, analysed and/or recovered (e.g., in a composition or in purified form) separately from said remaining peptides of the protein peptide mixture.
  • the term "isolating” denotes a process of separating the recited peptides from other peptides of a protein peptide mixture (PPM), such that said recited peptides can be identified, quantified, analysed and/or recovered (e.g., in a composition or in purified form) separately from said other peptides.
  • a process of isolation may recover at least about 50%, e.g., at least about 60%, and more preferably substantially all of the recited peptides present in the protein peptide mixture (PPM).
  • the group of so-isolated peptides may comprise less than about 50%, e.g., less than about 40%, more preferably less than about 30%, even more preferably less than about 20%, still more preferably less than about 10%, and yet more preferably less than about 5%, such as, e.g., less than about 4%, 3%, 2% or 1 % or even down to 0% of peptides from the protein peptide mixture (PPM) other than the recited peptides.
  • PPM protein peptide mixture
  • a peptide bond "adjacent" to a given amino acid residue may be peptide bond which involves the Ca amino group of said amino acid residue and the Ca carboxyl group of the previous amino acid residue ("N-terminally adjacent” peptide bond), or peptide bond which involves the Ca carboxyl group of said amino acid residue and the Ca amino group of the following amino acid residue ("C-terminally adjacent” peptide bond).
  • N-terminally adjacent peptide bond which involves the Ca carboxyl group of said amino acid residue and the Ca amino group of the following amino acid residue
  • C-terminally adjacent peptide bond By means of illustration, in a sequence of residues (AA-1 )AA(AA+1 ), the peptide bond N-terminally adjacent to residue AA is indicated with an arrowhead, and the peptide bond C-terminally adjacent to residue AA is indicated with an arrow: v I
  • less than 20% of peptide bonds other than the recited ones would be cleaved, e.g., less than 15%, more preferably less than 10%, e.g., less than 7%, even more preferably less than 5%, e.g., less than 4%, less than 3% or less than 2%, and most preferably less than 1 %, e.g., less than 0.5%, less than 0.1 %, or less than 0.01 % or even less.
  • substantially all means 70% or more, e.g., 75% or more, preferably 80% or more, e.g., 85% or more, more preferably 90% or more, even more preferably 95% or more, and most preferably at least 96%, at least 97%, at least 98%, at least 99% or even 100% of said members or entities.
  • fragmentation refers to cleavage, preferably enzymatic or chemical cleavage, of one or more peptide bonds within said protein or within any one or more of its polypeptide chains. Fragmentation of protein mixture denotes fragmentation of proteins constituting said protein mixture.
  • proteins or protein mixtures may be fragmented so as to yield protein peptide mixtures having the preferred average or median chain lengths as detailed above.
  • N-terminal peptide N-terminal end of said protein or polypeptide chain
  • C-terminal peptide C-terminal end of said protein or polypeptide chain
  • fragmentation additionally produces one or more peptides derived from the portion of the protein or polypeptide chain interposed between the parts corresponding to the N- and C-terminal peptides ("internal peptides").
  • fragmentation as intended herein of the protein (P) or the mixture of proteins (PM) to achieve a protein peptide mixture (PPM) may be effected by suitable physical, chemical and/or enzymatic agents, more preferably chemical and/or enzymatic agents, even more preferably enzymatic agents, e.g., proteinases, preferably endoproteinases.
  • the fragmentation may be achieved by one or more, preferably one, protease (proteolytic enzyme), more preferably by one or more, preferably one, endoprotease (endopeptidase, proteinase, endoproteinase), i.e., a protease cleaving internally within a polypeptide chain.
  • endoproteinases suitable for such fragmentation includes endoproteinases selected from serine proteinases (EC 3.4.21 ), threonine proteinases (EC 3.4.25), cysteine proteinases (EC 3.4.22), aspartic acid proteinases (EC 3.4.23), metalloproteinases (EC 3.4.24) and glutamic acid proteinases.
  • protein fragmentation may be achieved using trypsin, chymotrypsin, elastase, Lysobacter enzymogenes endoproteinase Lys-C, Staphylococcus aureus endoproteinase GIu-C (endopeptidase V8) or Clostridium histolyticum endoproteinase Arg-C (clostripain).
  • trypsin trypsin
  • chymotrypsin elastase
  • Lysobacter enzymogenes endoproteinase Lys-C Lysobacter enzymogenes endoproteinase Lys-C
  • Staphylococcus aureus endoproteinase GIu-C endopeptidase V8
  • Clostridium histolyticum endoproteinase Arg-C clostripain
  • the invention encompasses the use of any further known or yet to be identified enzymes; a skilled person can choose suitable protease(s) on the basis of their cleavage specificity to achieve desired protein peptide mixtures of the invention.
  • the fragmentation as intended herein may be effected by endopeptidases of the trypsin type (EC 3.4.21.4), preferably trypsin, such as, without limitation, preparations of trypsin from bovine pancreas, human pancreas, porcine pancreas, recombinant trypsin, Lys- acetylated trypsin, etc.
  • trypsin cleaves highly specifically peptide bonds C-terminally adjacent to arginine and lysine residues (except where the following residue is Pro), and also cleaves C-terminally adjacent to homoarginine residues, albeit at a slower rate.
  • the invention also contemplates the use of any trypsin-like protease, i.e., with a similar specificity to that of trypsin. Trypsin is particularly useful in proteomics applications, inter alia due to high specificity and efficiency of its cleavage.
  • chemical reagents may be used to fragment proteins into peptides. For example, CNBr can fragment proteins at Met; BNPS-skatole can fragment at Trp. Alternatively, chemical fragmentation can also be achieved by limited protein hydrolysis under acidic conditions.
  • the conditions for treatment e.g., protein concentration, enzyme or chemical reagent concentration, pH, buffer, temperature, time
  • reacting generally refers to bringing together of designated reactants under conditions that allow a desired chemical transformation, such that compound(s) different from the reactants initially introduced to the reaction are generated.
  • chromatography encompasses methods for separating chemical substances, referred to as such and vastly available in the art.
  • chromatography refers to a process in which a mixture of chemical substances (analytes) carried by a moving stream of liquid or gas ("liquid phase” or “mobile phase”) is separated into components as a result of differential distribution of the solutes or analytes, as they flow around or over a stationary liquid or solid phase (“stationary phase”), between said liquid or mobile phase and said stationary phase.
  • the stationary phase may be usually a finely divided solid, a sheet of filter material, or a thin film of a liquid on the surface of a solid, or the like.
  • Chromatography is also widely applicable for the separation of chemical compounds of biological origin, such as, e.g., amino acids, proteins, fragments of proteins, peptides, phospholipids, steroids, etc.
  • exemplary types of chromatography include, without limitation, high-performance liquid chromatography (HPLC), normal phase chromatography (NP) such as NP-HPLC, reversed phase chromatography (RP) such as RP-HPLC, ion exchange chromatography, such as cation or anion exchange chromatography, hydrophilic interaction chromatography (HILIC), hydrophobic interaction chromatography (HIC), size exclusion chromatography (SEC) including gel filtration chromatography or gel permeation chromatography, chromatofocusing, affinity chromatography such as immuno-affinity and immobilised metal affinity chromatography.
  • HPLC high-performance liquid chromatography
  • NP normal phase chromatography
  • RP reversed phase chromatography
  • ion exchange chromatography such as cation or anion exchange chromatography
  • the chromatography may be columnar, i.e., wherein the stationary phase is deposited or packed in a column.
  • the migration of solutes in chromatography can be expressed as "elution time” or “retention time”, being the time elapsed between the start of the chromatographic separation and the moment at which a solute of interest emerges at a given distance along the stationary phase where it is detected and/or collected.
  • basic amino acid generally refers to amino acids, preferably ⁇ -amino acids, more preferably ⁇ -L-amino acids, such as present in proteins, wherein the pK A of the protonated form of their side chain (i.e., pK R ) is about 4 or greater (> 4), preferably about 5 or greater (> 5), and more preferably of about 6 or greater, e.g., about 8 or greater (> 8) or about 10 or greater (preferably, e.g., > 9 or > 10).
  • Particularly preferred basic amino acids include lysine, arginine, histidine and homoarginine, preferably Lys, Arg and His.
  • the side chain of basic amino acids comprises a basic moiety, such as, e.g., the imidazole moiety of histidine, the ⁇ -amino moiety of lysine or the guanidino moiety of arginine and homoarginine.
  • net charge refers to the arithmetic sum of the charges of all the atoms taken together for a molecule.
  • the term “about zero net charge” encompasses zero net charge but may also encompass small deviations therefrom, such as charges between -0.2 and +0.2, more preferably between -0.1 and +0.1 or even more preferably between -0.05 and +0.05.
  • the term “majority” is synonymous with “substantially all” as defined herein and refers to 70% or more, e.g., 75% or more, preferably 80% or more, e.g., 85% or more, more preferably 90% or more, even more preferably 95% or more, and most preferably at least 96%, at least 97%, at least 98%, at least 99% or even 100%.
  • dissociated and protonated refer to ionisation states of an atom or a moiety, wherein “dissociation” denotes the loss of H + , such as by acidic moieties and “protonation” denotes acceptance of H + , in particular by basic moieties.
  • strong cation exchange or “strong acid cation exchange” or “SCX” chromatography refers to cation exchange chromatography (preferably columnar chromatography or solid phase extraction techniques inter alia SCX using solid phase extraction cartridges, magnetic or centrifugable SCX beads, etc.) using strong acid cation exchange resins, as well-known in the art, preferably using a stationary phase that maintains constant net negative charge in the range of pH about 2-12, preferably about 1-14, or even substantially irrespective of pH.
  • SCX stationary phase may include solid support functionalised with strong acidic groups, such as preferably sulphonic acid groups.
  • such resins may be of gelular or macroporous type and may contain strong acidic groups, such as preferably sulphonic acid groups, in the free acid (H- form) or neutralised (salt-form, for example, sodium or potassium salts) state.
  • a non-limiting example hereof may be, e.g., wide-pore silica packing with a bonded coating of hydrophilic polymer, e.g., poly(2-sulfoethyl aspartamide); see, e.g., Crimmins et al. 1988 (J Chromatogr 443: 63-71 ).
  • SCX chromatography typically, elution of solutes in SCX chromatography can be achieved with salt solutions, such as, e.g., NaCI, KCI or (NhU) 2 SO 4 gradients.
  • salt solutions such as, e.g., NaCI, KCI or (NhU) 2 SO 4 gradients.
  • SCX columns may be used herein, such as without limitation ones summarised in Table 2:
  • strong anion exchange or "SAX” chromatography generally refers to anion exchange chromatography (preferably columnar chromatography or solid phase extraction techniques inter alia SAX using solid phase extraction cartridges, magnetic or centrifugable SAX beads, etc.), using a stationary phase that maintains constant net positive charge in the range of pH about 2-12, preferably about 1-14, or even substantially irrespective of pH.
  • SAX stationary phase include solid supports functionalised with quaternary ammonium groups, such as inter alia - CH 2 CH 2 N + (CH 2 CH 3 ) 2 CH 2 CH(OH)CH 3 .
  • substituted means that one or more hydrogens on the atom (typically a C-, N-, O- or S-atom, usually a C-atom) indicated by the modifier "substituted” is replaced with a selection from the specified group, provided that the indicated atom's normal valence is not exceeded, and that the substitution results in a chemically stable compound, i.e., a compound that is sufficiently robust to survive preparation and/or isolation to a useful degree of purity.
  • the term “one or more” covers the possibility of all the available atoms, where appropriate, to be substituted, preferably, one, two or three.
  • each definition is independent.
  • single bond refers to the direct joining by a single covalent bond of the substituents flanking (preceding and succeeding) the variable taken as a "single bond".
  • variable X in formula (I) is said to be a single bond, this refers to direct joining by a single covalent bond of the two carbon atoms flanking X in formula (I).
  • alkyl refers to a (preferably monofunctional) saturated straight or branched hydrocarbon radicals.
  • exemplary alkyl radicals include inter alia Ci -2 O alkyls, d_i 0 alkyls, Ci -6 alkyls or C1-4 alkyls, such as without limitation methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, octyl, octadecyl and the like.
  • C1-6 alkyl alone or as part of another group, means a mono-functional (monovalent) saturated branched or un-branched hydrocarbon radical of between 1 and 6, e.g., 1 , 2, 3, 4, 5 or 6, carbon atoms.
  • Ci_ 6 alkyl radicals encompass, for example, methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, isoamyl, and the like.
  • C 1 - 4 alkyl alone or as part of another group, means a mono-functional (monovalent) saturated branched or un-branched hydrocarbon radical of between 1 and 4, e.g., 1 , 2, 3 or 4, carbon atoms.
  • C 1 - 4 alkyl radicals encompass, for example, methyl, ethyl, n- propyl, isopropyl, n-butyl, isobutyl, sec-butyl and tert-butyl radicals, and the like.
  • Ci -4 haloalkyl refers to C 1 - 4 alkyl radical as defined herein in which at least one hydrogen atom on the C 1 - 4 alkyl radical is replaced by a halogen atom, preferably -F, -Cl, -Br or -I, more preferably -F, -Cl or -Br, even more preferably -F or -Cl.
  • Ci -4 perhaloalkyl refers to Ci -4 alkyl radical as defined herein in which all hydrogen atoms on the Ci -4 alkyl radical are replaced by same or different halogen atoms, preferably -F, -Cl, -Br or -I, more preferably -F, -Cl or -Br, even more preferably -F or -Cl.
  • Ci -4 alkylene alone or as part of another group, means a bi-functional (bivalent) saturated branched or un-branched hydrocarbon radical of between 1 and 4, e.g., 1 , 2, 3 or 4, carbon atoms.
  • Ci -4 alkylene radicals encompass, e.g., methylene, ethylene, propylene, methylethylene, butylene, and the like.
  • C 3 _ 8 cycloalkyl alone or as part of another group, means a mono-functional (monovalent) saturated or partially unsaturated, monocyclic, bi-cyclic or polycyclic hydrocarbon radical wherein each cyclic moiety contains between 3 and 8, e.g., 2, 3, 4, 5, 6, 7 or 8, carbon atoms.
  • a partially unsaturated C 3 _ 8 cycloalkyl radical contains at least one double bond in at least one of its cyclic moieties.
  • C 3 - ⁇ cycloalkyl radical may be preferably monocyclic or bi-cyclic, more preferably monocyclic.
  • Examples of monocyclic C 3 _ 8 cycloalkyl radicals include, without limitation, cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl, cycloheptyl, cyclooctyl, cyclopentenyl, cyclohexenyl, and the like.
  • aryl alone or as part of another group, means a mono-functional (monovalent), monocyclic, bi-cyclic or tri-cyclic aromatic hydrocarbon radical.
  • each cyclic moiety of an aryl radical may contain 6 carbon atom ring members.
  • an aryl radical is a phenyl, diphenyl or naphthyl radical, more preferably phenyl radical.
  • Naphthyl radical encompasses, e.g., 1- or 2-naphthyl radicals.
  • heterocyclyl alone or as part of another group, means a mono-functional (monovalent), monocyclic, bi-cyclic or polycyclic, saturated or partially unsaturated, heterocyclic radical.
  • each cyclic moiety of a heterocyclyl radical contains between 3 and 12 ring members, more preferably between 5 to 10 ring members and still more preferably 5 to 6 ring members.
  • At least one cyclic moiety of a heterocyclyl radical, preferably more than one and more preferably each cyclic moiety of the heterocyclyl radical contains one or more heteroatom ring members selected from nitrogen, oxygen or sulphur.
  • heterocyclyl radicals include, without limitation, mono-functional radicals of dihydropyrrole, tetrahydropyrrole, dihydrofuran, tetrahydrofuran, dihydrothiophene, tetrahydrothiophene, piperidine, pyran, dihydropyran, tetrahydropyran, piperazine, oxazine, dioxane, dithiane, and the like.
  • heteroaryl alone or as part of another group, means a (preferably mono- functional, i.e., monovalent) monocyclic, bi-cyclic or tri-cyclic aromatic heterocyclic radical.
  • each cyclic moiety of a heteroaryl radical contains between 3 and 12 ring members, more preferably between 5 to 10 ring members and still more preferably 5 to 6 ring members.
  • At least one cyclic moiety of a heteroaryl radical preferably more than one and more preferably each cyclic moiety of the heteroaryl radical, contains one or more heteroatom ring members selected from nitrogen, oxygen or sulphur.
  • Exemplary heteroaryl radicals include, without limitation, mono-functional radicals of pyridine, pyrrole, imidazole, pyrazole, oxazole, thiazole, furan, pyridazine, pyrimidine, pyrazine, thiophene, and the like.
  • radical 1 depicts, alone or as part of another group, the radical 1 , in which at least one hydrogen atom on the radical 1 is replaced by radical 2.
  • radical 2 radical 1 depicts, alone or as part of another group, the radical 1 , in which at least one hydrogen atom on the radical 1 is replaced by radical 2.
  • aryl C 1 - 4 alkyl alone or as part of another group, means a C 1-4 alkyl radical as defined herein, in which at least one hydrogen atom on the C 1 - 4 alkyl radical is replaced by an aryl radical as defined herein.
  • Ci_ 6 alkoxy or “C- ⁇ _ 6 alkyloxy” means a radical of the formula -0-C 1-3 alkyl, wherein d-6 alkyl is as defined herein;
  • C 1-4 alkoxy or “C 1-4 alkyloxy” means a radical of the formula - 0-C 1-4 alkyl, wherein C 1-4 alkyl is as defined herein;
  • aryloxy means a radical of the formula - O-aryl, wherein aryl is as defined herein;
  • heteroaryloxy means a means a radical of the formula -O-heteroaryl, wherein heteroaryl is as defined herein;
  • heterocyclyloxy means a radical of the formula -O-heterocyclyl, wherein heterocyclyl is as defined herein;
  • C 1-3 alkylthio means a radical of the formula -S-C 1-3 alkyl, wherein d-6 alkyl is as defined herein;
  • the first group of aspects of the invention is concerned around a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of: (a) fragmenting the protein (P) or the mixture of proteins (PM) preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types (X 1 , X 2 ,... X ⁇ ) to obtain the protein peptide mixture (PPM); and
  • the protein (P) or the mixture of proteins (PM) is fragmented preferentially at peptide bonds C-terminally adjacent to one or more specific amino acid residue types (denoted as X 1 , X 2 ,... X ⁇ ) to obtain the protein peptide mixture (PPM).
  • said distinction allows to separate and isolate the C-terminal peptides of the protein (P) or protein mixture (PM).
  • such preferential fragmentation at specific peptide bonds may allow to predict, e.g., in silico, the resulting peptides and relevant properties thereof, e.g., charge, size, molecular weight, etc.
  • This information can be use to identify the peptides, in particular the isolated peptides comprising the C-terminal ends from the protein (P) or the protein mixture (PM), and consequently deduce the identity of the proteins subjected to the fragmentation.
  • the protein (P) or protein mixture (PM) may be fragmented at substantially all peptide bonds C-terminally adjacent to amino acid residues of the one or more types X 1 , X 2 ,... X ⁇ present in said protein (P) or protein mixture (PM).
  • the fragmentation occurs substantially quantitatively after all amino acid residues of the one or more types X 1 , X 2 ,... X ⁇ .
  • Most peptides comprising the C-terminal ends from the protein (P) or protein mixture (PM) will thus not comprise any of the one or more amino acid residues X 1 , X 2 ,... and X ⁇ .
  • the method of the invention may subsequently employ either specific modification or specific removal of the one or more amino acid residue types X 1 , X 2 ,... and X ⁇ from the obtained peptides to discriminate away N-terminal and internal peptides as taught herein.
  • This embodiment also advantageously produces smaller-size peptides.
  • the protein (P) or protein mixture (PM) may be fragmented at substantially all peptide bonds C-terminally adjacent to the one or more amino acid residue types X 1 , X 2 ,... X ⁇ only when the respective residue X 1 , X 2 ,... or X ⁇ forms a part of a specific sequence element, e.g., a sequence element of ⁇ 10, preferably ⁇ 7, more preferably ⁇ 5, even more preferably ⁇ 3 and most preferably 2 amino acids, preferably consecutive amino acids. While peptides comprising the C-terminal ends from the protein (P) or protein mixture (PM) may thus comprise the one or more amino acid residue X 1 , X 2 ,...
  • the method of the invention may subsequently employ specific removal (e.g., preferably by a carboxypeptidase) of the one or more amino acid residues X 1 , X 2 ,... and X ⁇ from the obtained peptides to discriminate away N-terminal and internal peptides, as taught herein.
  • the one or more specific amino acid residue types X 1 , X 2 ,... X ⁇ downstream of which fragmentation is contemplated herein may be selected from any amino acid residues, including, but not limited to amino acids found in naturally occurring proteins, amino acids carrying a post-translational modification, amino acids including a non-natural isotope, or amino acids further chemically and/or enzymatically altered prior to the fragmentation, etc.
  • the protein (P) or the mixture of proteins (PM) can be fragmented preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types X 1 , X 2 ,... X ⁇ , wherein said one or more amino acid residue types X 1 , X 2 ,...
  • X ⁇ are chosen from the group consisting of: GIy, Pro, Ala, VaI, Leu, lie, Met, Cys, Phe, Tyr, Trp, His, Lys, Arg, GIn, Asn, GIu, Asp, Ser, Thr, homoarginine, 4-hydroxyproline, ⁇ -N,N,N- trimethyllysine, 3-methylhistidine, 5-hydroxylysine, O-phosphoserine, ⁇ -carboxyglutamate, ⁇ - N-acetyllysine, ⁇ -N-methylarginine, N-acetylserine, N,N,N-trimethylalanine, citrulline, ornithine and homocysteine.
  • said one or more amino acid residue types X 1 , X 2 ,... X ⁇ may be chosen from the 20 amino acid types commonly found in natural proteins: GIy, Pro, Ala, VaI, Leu, lie, Met, Cys, Phe, Tyr, Trp, His, Lys, Arg, GIn, Asn, GIu, Asp, Ser or Thr.
  • amino acid residue types X 1 , X 2 ,... or X ⁇ may be preferably selected from amino acid types whose side chain comprises comparably reactive moieties.
  • mercapto mercapto
  • alkylthio (-S- alkyl) preferably Ci -3 alkylthio, more preferably methyl
  • the one or more amino acid residue types X 1 , X 2 ,... X ⁇ may comprise a moiety chosen from mercapto; methylthio; p-hydroxyphenyl; primary amino; indyl; imidazyl; guanidino; and carboxyl.
  • the one or more amino acid residue types X 1 , X 2 ,... X n may comprise a moiety chosen from mercapto and methylthio; or chosen from primary amino, indyl, imidazyl and guanidino; or chosen from primary amino and guanidino; or chosen from guanidino; or chosen from p-hydroxyphenyl; or chosen from carboxyl.
  • the one or more amino acid residue types X 1 , X 2 ,... X ⁇ may be chosen from the group consisting of Met, Cys, Tyr, Trp, His, Lys, hArg, Arg, GIu and Asp; or the group consisting of Met, Cys, Tyr, Trp, His, Lys, Arg, GIu and Asp; or the group consisting of His, Lys and Arg; or the group consisting of Lys, Arg and hArg; or the group consisting of Lys and Arg; or the group consisting of Arg and hArg; or the group consisting of Met and Cys; or the group consisting of Tyr and Trp; or the group consisting of Asp and GIu.
  • the present method further comprises reacting the protein peptide mixture (PPM) with an agent capable of specifically modifying or removing said one or more amino acid residue types X 1 , X 2 ,... X n .
  • the protein peptide mixture (PPM) or portion thereof may be reacted with a suitable agent under conditions allowing said agent to react with and desirably modify or alter the one or more amino acid residues X 1 , X 2 ,... X ⁇ in peptides comprising such.
  • peptides that comprise said one or more amino acid residues X 1 , X 2 ,... or X ⁇ may be modified to form an adduct between the one or more amino acid residues X 1 , X 2 ,... or X n and one or more molecules of the modifying agent.
  • the resulting adduct is stable, such that it persists in substantially all so modified peptides, and more preferably in substantially all so modified amino acid residues X 1 , X 2 ,... X ⁇ , under the physical (e.g., temperature, light) and chemical (e.g., pH, ionic strength, solvents) conditions used in subsequent steps of the method of the invention, such as, e.g., in the step employed to separate the unaltered peptides from the altered peptides.
  • the term "specifically modify" means that as a result of said reacting, peptides that comprise the one or more amino acid residues X 1 , X 2 ,...
  • said reacting is quantitative, i.e., as a result of thereof, substantially all peptides that comprise the one or more amino acid residues X 1 , X 2 ,... or X ⁇ will become modified with the modifying agent used. Moreover, as a result of said reacting, substantially all of the one or more amino acid residues X 1 , X 2 ,... and X ⁇ present in peptides of the protein peptide mixture can become modified.
  • the protein peptide mixture (PPM) or portion thereof may be reacted with a suitable agent under conditions allowing said agent to remove the one or more amino acid residues X 1 , X 2 ,... X ⁇ from peptides comprising such.
  • "removal" of a given amino acid residue from a peptide refers to cleaving, e.g., hydrolysing, of the peptide bond(s) that connect said residue to the remainder of the peptide.
  • X n will typically be the last (i.e., -COOH end) residue of the respective peptides, and the removal of said residue X 1 , X 2 ,... or X n will involve cleavage of the peptide bond N-terminally adjacent to said residue X 1 , X 2 , ... or X ⁇ .
  • the term "specifically remove” means that the agent will remove the one or more amino acid residue types X 1 , X 2 , ... or X ⁇ from peptides comprising such, but of peptides that do not include the one or more amino acid residues X 1 , X 2 ,... or X n substantially all will not be altered by said agent.
  • Such specificity may be achieved, e.g., when the agent at the conditions of the reaction removes the one or more amino acid residues X 1 , X 2 , ... or X n but substantially not any other type of amino acid residues, or when amino acid residues other than the one or more X 1 , X 2 , ... or X ⁇ , that would normally be removed by said agent, are suitably blocked in the protein (P), protein mixture (PM) or protein peptide mixture (PPM) before said reacting.
  • P protein
  • PM protein mixture
  • PPM protein peptide mixture
  • said reacting is quantitative, i.e., as a result of thereof, substantially all peptides of the protein peptide mixture (PPM) that comprise the one or more amino acid residues X 1 , X 2 , ... or X ⁇ will become altered with the agent used. Moreover, as a result of said reacting, substantially all of the one or more amino acid residues X 1 , X 2 , ... and X ⁇ present in peptides of the protein peptide mixture (PPM) can be removed.
  • PPM protein peptide mixture
  • alter refers to the introduction of a specific change to said peptide by reacting the peptide with agents of the invention as defined herein. Such alteration may involve a specific chemical and/or enzymatic modification to or removal of one or more amino acids of a peptide. Preferably, introduction of said specific alteration allows to subsequently distinguish and/or separate the altered and unaltered peptides by suitable methods, such as, e.g., by chromatography.
  • suitable agent capable of specifically modifying the one or more amino acid residue types X 1 , X 2 , ... X ⁇ in peptides of the protein peptide mixture (PPM), as taught herein.
  • PPM protein peptide mixture
  • exemplary modifying agents may comprise 1 -fluoro-2,4-dinitrobenzene (e.g., reacting to dinitrophenyl-Lys), trinitrobenzene sulphonic acid (e.g., reacting to trinitrophenyl-Lys), ethylthiotrifluoroacetate, (e.g., reacting to trifluoroacetyl-Lys) or succinyl anhydride (e.g., reacting to succinyl-Lys); and preferably O-methylisourea (e.g., guanidinylation of the side chain -NH 2 groups) which preferentially modifies the side chain -NH 2 groups and does not
  • exemplary modifying agents may comprise a dicarbonyl compound or derivative thereof, a peptidylarginine deiminase or an arginase; - when one of the one or more amino acid residues X 1 , X 2 ,...
  • X ⁇ is a residue comprising a side chain mercapto group, such as, e.g., Cys or homocysteine
  • exemplary modifying agents may comprise iodoacetate (e.g., reacting to S-carboxymethyl-Cys), 1-fluoro-2,4- dinitrobenzene (e.g., reacting to S-dinitrophenyl-Cys), N-ethylmaleimide, p- hydroxymercuribenzoate, 5,5'-dithiobis(2-nitrobenzoic acid), or performic acid (e.g., reacting to cysteic acid); when one of the one or more amino acid residues X 1 , X 2 ,...
  • X ⁇ is a residue comprising an side chain alkylthio group, preferably methylthio group, such as, e.g., Met
  • exemplary modifying agents may comprise cyanogen bromide (reacting to peptidyl homoserine lactone), iodoacetate (reacting to S-carboxymethyl-Met) or performic acid (reacting to methionine sulphone);
  • exemplary modifying agents may comprise diazomethane (reacting to methyl ester) or glycine methyl ester (reacting to an amide); - when one of the one or more amino acid residues X 1 , X 2 ,...
  • X n is a residue comprising an side chain indyl group, such as, e.g., His or 3-methylhistidine, preferably His
  • exemplary modifying agents may comprise iodoacetate or diethylpyrocarbonate (reacting to ethylcarboxamido-His); when one of the one or more amino acid residues X 1 , X 2 ,...
  • X ⁇ is a residue comprising an side chain imidazyl group, such as, e.g., preferably Trp, exemplary modifying agents may comprise 2, 4-dinitrophenylsulphenyl chloride or N-bromosuccinimide; - when one of the one or more amino acid residues X 1 , X 2 ,... X ⁇ is a residue comprising a side chain hydroxyphenyl group, such as, e.g., Tyr, exemplary modifying agents may comprise tetranitromethane (reacting to 3-nitrotyrosine).
  • the invention further also contemplates any suitable agent capable of specifically removing the one or more amino acid residue types X 1 , X 2 ,... X ⁇ from peptides of the protein peptide mixture (PPM), as taught herein.
  • PPM protein peptide mixture
  • the following exemplary embodiments can serve as further guidance to selection of such suitable agents: when one of the one or more amino acid residues X 1 , X 2 ,... X ⁇ is Arg, Lys or hArg, preferably Arg or Lys, and said residue is the last (i.e., -COOH end) residue of a peptide, exemplary agents to specifically remove said residue from the peptide may comprise carboxypeptidase B (EC 3.4.17.2; see, e.g., Folk 1970.
  • Carboxypeptidase B also known as protaminase or pancreatic carboxypeptidase B, has been isolated from a variety of sources, such as pancreas of cattle, pig and dogfish, etc., and all its origins and forms, including any recombinantly produced forms thereof, are contemplated for use herein; when one of the one or more amino acid residues X 1 , X 2 ,...
  • X ⁇ is a basic amino acid, preferably Arg or Lys, more preferably Lys, and said residue is the last residue of a peptide
  • exemplary agents to specifically remove said residue from the peptide may comprise carboxypeptidase N (EC 3.4.17.3; see, e.g., Plummer & Erd ⁇ s 1981.
  • exemplary agents to specifically remove said residue from the peptide may comprise carboxypeptidase G (EC 3.4.17.11 ; see, e.g., Goldman & Levy 1967. PNAS 58: 1299-1306).
  • the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to amino acid residue(s) that comprise a guanidino moiety.
  • the fragmentation may be C-terminally adjacent to Arg and/or hArg (wherein hArg may be obtained in the proteins by suitably converting the side chains of lysine residues thereto before proteolysis), and/or C-terminally adjacent to Lys, wherein the lysine may be advantageously converted to hArg subsequent to said fragmentation.
  • Lys can be converted to hArg by methods known in the art, such as, e.g., by guanidinylation of the ⁇ - NH 2 groups of Lys with O-methylisourea, e.g., as disclosed in Plapp et al. 1971 (J Biol Chem 246: 939-945).
  • the protein (P) or protein mixture (PM) can be fragmented preferentially at peptide bonds C-terminally adjacent to:
  • Such fragmentation may be advantageously achieved using trypsin type endopeptidases, preferably trypsin, which cleave highly specifically after Arg and Lys and, to a lesser extent, after hArg. If cleavage after Lys is not desired, as in some of the above embodiments, Lys residues may be suitably blocked before said proteolysis, as described elsewhere in this specification.
  • the invention also contemplates the use of any trypsin-like protease, i.e., with a similar specificity to that of trypsin.
  • peptides comprising a guanidino moiety will thus mainly be derived from the N-termini or internal portions of the starting protein(s), whereas peptides lacking a guanidino moiety will mostly originate from and comprise the C- terminal ends of the starting protein(s).
  • the invention contemplates particularly advantageous manners to modify peptides that include residues with a guanidino moiety.
  • the protein (P) or protein mixture (PM) may be subjected to chemical and/or enzymatic pre-treatment before fragmentation.
  • the protein peptide mixture (PPM) may be subjected to chemical and/or enzymatic pre-treatment before the primary run or before reacting with an agent of the invention.
  • pre-treatments may allow broadening of the spectrum of classes of peptides which can be isolated with the invention.
  • pre-treatments may alter the specificity of the fragmentation, or may block reactive groups of the peptide to prevent their reactivity under the conditions of the method, or may affect the modification of peptides with the agent of the invention, etc.
  • the protein (P), mixture of proteins (PM) or protein peptide mixture (PPM) may be exposed to one or more blocking reagents, simultaneously or sequentially, which reagents may preferably fall into the following classes: modifiers of protein primary amines; modifiers of protein primary amines only present in amino acid side chains; or modifiers of cysteine residues.
  • Suitable blocking reagents as well as methods and conditions for attaching the blocking groups will be clear to the skilled person and are generally described in the standard handbooks of organic chemistry, such as Greene and Wuts, "Protective groups in organic synthesis", 3rd Edition, Wiley and Sons, 1999, which is incorporated herein by reference in its entirety.
  • ⁇ -NH 2 groups of lysine residues can be converted to guanidino groups using O-methylisourea, before fragmentation C-terminally to arginine and homoarginine residues, preferably using trypsin type protease.
  • lysine residues may be so-converted to homoarginine only after fragmentation C-terminally adjacent to arginine and lysine residues, e.g., using trypsin type protease. This allows the combination of the advantageous digestion with trypsin with the ability to modify substantially all non-C- terminal peptides with agents of the invention that specifically modify guanidino groups.
  • primary amines in particular the ⁇ -NH 2 groups of lysine and/or the N- terminal ⁇ -NH 2 group, will be blocked with a blocking reagent that reacts with primary amines and presents a non-reactive substituent for subsequent steps.
  • the blocking reagent may be generally substituted once or twice on each primary amine.
  • primary amines may be blocked with acetyl N-hydroxysuccinimide resulting in acetylation of the primary amines, as known in the art.
  • ⁇ -NH 2 groups of lysines may be so-blocked in the protein (P) or protein mixture (PM) prior to fragmentation with a trypsin-type endoprotease.
  • acetylation avoids the need to guanidinylate lysine residues to homoarginine and also produces a different variety of peptides.
  • Other agents known to modify ⁇ -NH 2 and/or ⁇ -NH 2 groups in proteins may include, e.g., 1-fluoro-2,4-dinitrobenzene (FDNB), trinitrobenzene sulphonic acid, ethylthiofluoroacetate and succinic anhydride.
  • the -SH groups of cysteine side chains of the protein (P) or protein mixture (PM) or of the peptides of protein peptide mixture (PPM) may be blocked, to avoid their reactivity, e.g., susceptibility to oxidation, in subsequent steps of the method.
  • the blocking reagent can be any that reacts selectively with cysteine side chains and presents a non-reactive substituent for subsequent reactions.
  • the sample may be treated with tributylphosphine, followed by iodoacetamide in protein denaturing buffers, leading to acetamide derivatisation of cysteine-side chains; otherwise, cysteine -SH groups can be blocked by alkylation as known in the art.
  • Other agents known to modify -SH groups of cysteine in proteins may include, e.g., 1-fluoro-2,4-dinitrobenzene (FDNB) and N-ethylmaleimide.
  • the treatment with a primary amine blocker may occur prior to treatment with an -SH-group blocker, or vice versa.
  • the resulting sample may then be optionally be purified, using techniques known in the art, such as evaporation of solvent, washing, filtration, and chromatographic techniques, such as column chromatography (e.g. disposable preparative cartridge), etc.
  • peptides of the protein peptide mixture (PPM) that comprise a guanidino moiety may be modified by reacting with a dicarbonyl compound or derivative thereof.
  • the dicarbonyl compound may comprise two carbonyl groups.
  • said at least two carbonyl groups may be two aldehyde groups
  • dialdehyde two ketone groups (diketone), or one aldehyde and one ketone group (aldehyde-ketone), or at least one of said at least two carbonyl groups may be a part of a carboxyl group (e.g., aldehyde-acid, keto-acid), an ester group (e.g., aldehyde-ester, keto- ester) or a thioester group.
  • the dicarbonyl compound may be a dialdehyde, a diketone or an aldehyde-ketone compound.
  • derivative generally denotes that one or more atoms of said dicarbonyl compound may be substituted with one or more same or different functional groups.
  • said dicarbonyl compound or a derivative thereof is a molecule of the formula (I):
  • X is chosen from the group comprising or consisting of:
  • Ci- 4 alkylene optionally substituted with one or more hydroxy, oxo, formyl, carboxy, mercapto, fluoro, chloro, bromo, amino, substituted amino (-NR a R b wherein R a and R b are each independently hydrogen or C 1 - 4 alkyl), nitro, C 1 - 4 alkyl, C 1 - 4 haloalkyl, C 1 - 4 perhaloalkyl, C 1 ⁇ alkoxy, Ci -4 alkanoyl, Ci -4 alkanoyloxy, C 1 ⁇ alkyloxycarbonyl, aryl, heteroaryl, heterocyclyl, aryl C1-4 alkyl, heteroaryl C1-4 alkyl, or heterocyclyl C1-4 alkyl;
  • R 1 and R 2 are, each independently, chosen from the group comprising or consisting of:
  • X is chosen from the group comprising or consisting of:
  • R c is, each independently, hydroxy, oxo, formyl, carboxy, mercapto, fluoro, chloro, bromo, amino, substituted amino, nitro, Ci -4 alkyl, Ci -4 haloalkyl, Ci -4 perhaloalkyl, Ci -4 alkoxy, C 1 ⁇ alkanoyl, C1-4 alkanoyloxy, C1-4 alkyloxycarbonyl, aryl, heteroaryl, heterocyclyl, aryl C1-4 alkyl, heteroaryl C1-4 alkyl or heterocyclyl Ci- 4 alkyl; "R d " is, each independently, hydroxy, fluoro, chloro, bromo, nitro, Ci -4 alkyl, C ⁇ 4 haloalkyl, Ci -4 perhaloalkyl, Ci -4 alkoxy, C ⁇ 4 alkanoyl, C 1 ⁇ alkanoyloxy, aryl or
  • R c , R d , R e and R f , C 1 - 4 alkyl, alone or as a part of another group may preferably be Ci -3 alkyl, more preferably methyl or ethyl;
  • Ci -4 haloalkyl, alone or as a part of another group may preferably be Ci_ 3 haloalkyl, more preferably halomethyl or haloethyl;
  • Ci -4 perhaloalkyl, alone or as a part of another group may preferably be Ci -3 perhaloalkyl, more preferably perhalomethyl or perhaloethyl;
  • Ci -4 alkoxy, alone or as a part of another group may preferably be Ci -3 alkoxy, more preferably methoxy or ethoxy;
  • C 1 - 4 alkanoyl, alone or as a part of another group may preferably be Ci -3 alkanoyl, more preferably formyl or acetyl;
  • X is a single bond.
  • X is C1-4 alkylene, preferably C1-3 alkylene, more preferably Ci -2 alkylene, e.g., methylene or ethylene, even more preferably methylene, each group optionally substituted with one or more R c , preferably with one or more R d , more preferably with one or more R e , even more preferably with one or more R f , yet more preferably with one or more R h , and most preferably with nitro.
  • X is methylene substituted with nitro, preferably with only one nitro moiety, more preferably X is -CH(-NO 2 )- or any protonated or dissociated form thereof, such as, e.g., -C ⁇ (-NO 2 )-.
  • Ri and R 2 is chosen, each independently, from the group comprising or consisting of:
  • each R 1 and R 2 is hydrogen (-H); and X is as defined in any one of above embodiments "E1", “E2", “E3” or “E4", preferably “E3” or “E4", more preferably "E4".
  • R 1 and R 2 is chosen, each independently, from the group comprising or consisting of: hydrogen, Ci -6 alkyl, aryl, heteroaryl, aryl C 1 ⁇ alkyl, heteroaryl C 1 .
  • R 1 and R 2 is chosen, each independently, from the group comprising or consisting of: hydrogen, Ci -6 alkyl, aryl, heteroaryl, aryl C 1 - 4 alkyl and heteroaryl C 1 - 4 alkyl, each group being optionally substituted with one or more R c , more preferably with one or more R d , even more preferably with one or more R e , still more preferably with one or more R f , and most preferably with one or more R 9 ; and X is as defined in any one of above embodiments "E1", “E2", “E3” or "E4".
  • Ri and R 2 is chosen, each independently, from the group comprising or consisting of: hydrogen, C 1 - 6 alkyl, aryl and heteroaryl, each group being optionally substituted with one or more R c , more preferably with one or more R d , even more preferably with one or more R e , still more preferably with one or more R f , and most preferably with one or more R 9 ; and X is as defined in any one of above embodiments "E1", “E2", “E3", or "E4".
  • Ri and R 2 is chosen, each independently, from the group comprising or consisting of: hydrogen, aryl and heteroaryl, each group being optionally substituted with one or more R c , more preferably with one or more R d , even more preferably with one or more R e , still more preferably with one or more R f , and most preferably with one or more R 9 ; and X is as defined in any one of above embodiments "E1", “E2", “E3" or "E4".
  • Ri and R 2 is chosen, each independently, from the group comprising or consisting of: hydrogen, aryl, preferably phenyl or naphthyl, more preferably phenyl, each group being optionally substituted with one or more R c , more preferably with one or more R d , even more preferably with one or more R e , still more preferably with one or more R f , and most preferably with one or more R 9 ; and X is as defined in any one of above embodiments "E1", “E2", “E3” or "E4". In any of the above embodiments "E5", “E6", “E7", “E8”, “E9", "E10” or “E11”:
  • - d-6 alkyl may preferably be C 1 ⁇ alkyl, more preferably d-3 alkyl and even more preferably methyl or ethyl; and/or
  • - d-6 alkanoyl may preferably be Ci -4 alkanoyl, more preferably d-3 alkanoyl, even more preferably formyl or acetyl; and/or - Ci- 4 alkyl, alone or as a part of a group, may preferably be d_ 3 alkyl and more preferably methyl or ethyl; and/or
  • Ci-4 alkanoyl alone or as a part of a group, may preferably be d- 3 alkanoyl, more preferably formyl or acetyl; and/or aryl, alone or as a part of a group, may preferably be phenyl or naphthyl, more preferably phenyl; and/or the cycle or cycles of said heteroaryl may have 5 or 6 ring members and/or the heteroaryl may comprise one or more N- and/or one or more O- heteroatoms; and/or the cycle or cycles of said heterocyclyl may have 5 or 6 ring members and/or the heterocyclyl may comprises one or more N- and/or one or more O- heteroatoms.
  • At least one of Ri and R 2 is hydrogen, preferably only one of R 1 and R 2 is hydrogen, and the other of R 1 and R 2 is chosen from the groups as defined in any of the preceding embodiments applicable to substituents R 1 and R 2 ; and X is as defined in any one of above "E1", “E2", “E3” or "E4".
  • neither R 1 nor R 2 is hydrogen; and R 1 and R 2 are chosen from the groups as defined in any of the preceding embodiments applicable to substituents R 1 and R 2 ; and X is as defined in any one of above "E1", “E2”, “E3” or "E4".
  • At least one of R 1 and R 2 is chosen from the group comprising or consisting of aryl and heteroaryl, and is preferably aryl, more preferably naphthyl or phenyl, even more preferably phenyl, each group being optionally substituted with one or more R c , more preferably with one or more R d , even more preferably with one or more R e , still more preferably with one or more R f , and most preferably with one or more R 9 ; and X is as defined in any one of above "E1", “E2", “E3” or "E4".
  • one of R 1 and R 2 is hydrogen and the other of R 1 and R 2 is chosen from the group comprising or consisting of aryl and heteroaryl, and is preferably aryl, more preferably naphthyl or phenyl, even more preferably phenyl, each group being optionally substituted with one or more R c , more preferably with one or more R d , even more preferably with one or more R e , still more preferably with one or more R f , and most preferably with one or more R 9 ; and X is as defined in any one of above “E1", “E2", “E3", or "E4". It shall be appreciated that the dicarbonyl compound or a derivative thereof of formula (I) can include various combinations of substituents X, R 1 and R 2 as constructed by combining the above embodiments.
  • X is a single bond
  • R 1 and R 2 are, each independently, chosen from the group comprising or consisting of: hydrogen and aryl, preferably phenyl or naphthyl, more preferably phenyl, each group being optionally substituted with one or more R e , preferably with one or more R f , more preferably with one or more R 9 .
  • one, more preferably only one, or alternatively neither of R 1 and R 2 may be hydrogen.
  • the dicarbonyl compound or a derivative thereof is phenylglyoxal.
  • it is hydroxyphenylglyoxal, more preferably p-hydroxyphenylglyoxal.
  • X is methylene substituted with one or more nitro, preferably with one nitro, more preferably X is -CH(-NO 2 )- or any protonated or dissociated form thereof;
  • R 1 and R 2 are, each independently, as defined in any of embodiments "E5" to "E15" as above, preferably at least one of R 1 and R 2 is hydrogen.
  • X is methylene substituted with one or more nitro, preferably with one nitro, more preferably X is -CH(-NO 2 )- or any protonated or dissociated form thereof; R 1 and R 2 are each hydrogen.
  • the dicarbonyl compound is nitromalondialdehyde (NMA), i.e., X is -CH(-NO 2 )- or any protonated or dissociated form thereof, e.g., -C ⁇ (-NO 2 )-; and R 1 and R 2 are each hydrogen.
  • NMA nitromalondialdehyde
  • X is -CH(-NO 2 )- or any protonated or dissociated form thereof, e.g., -C ⁇ (-NO 2 )-
  • R 1 and R 2 are each hydrogen.
  • conjugated system refers to a system where a sequence of three or more atoms exhibits delocalised bonding over said three or more atoms, especially delocalisation of electrons across adjacent parallel aligned p-orbitals.
  • the dicarbonyl compound or derivative thereof may display stereoisomerism with cis- or trans- arrangement of substituents R 1 and R 2 , as generally illustrated in the above formulas (III) and (II), respectively.
  • the moieties R 1 and R 2 e.g., in the dicarbonyl compound or derivative thereof as individualised in any of the above embodiments, are in trans as set forth in formula (II) above, which may provide for less steric hindrance there between.
  • the dicarbonyl compound or a derivative thereof may be frans-hydroxyphenylglyoxal (IV):
  • V more preferably frans-p-hydroxyphenylglyoxal (V), i.e.:
  • the dicarbonyl compound or derivative may form a stable adduct with the guanidino moiety, preferably an adduct as set forth in any of formulas (Via) or (VIb), or (Vila) or (VIIb), including any protonated or dissociated forms thereof:
  • a preferred adduct may be the one set forth in formula (Via) or (VIb) above; this type of adduct may show advantages, such as, e.g., greater stability and/or a greater impact on the properties of the altered peptides due to the presence of two molecules of the dicarbonyl compound or derivative therein.
  • a preferred adduct may be any of the adducts set forth in formulas (Vila) or (VIIb); this type of adducts or further products thereof may show advantages, such as, e.g., greater stability and/or a greater impact on the properties of the altered peptides.
  • the adducts as set forth in any of formulas (Vila) or (VIIb) may undergo dehydration, whereby one or more water molecules are eliminated from the adducts.
  • dehydration By means of example and not limitation, reaction between an arginine residue (via its guanidino moiety) with nitromalondialdehyde (NMA), a preferred dicarbonyl compound of the invention, will eventually produce a ⁇ -(5-nitro-2-pyrimidyl)ornithyl derivative (see Figure 3).
  • adducts as set forth in formula (Via) or (VIb) may be preferably obtained by reacting peptides of a protein peptide mixture with a dicarbonyl compound or derivative thereof as defined in any of above embodiments "E10", “E11”; also preferably “E 14", “E 15”; more preferably “E 16” to “E 19", such as, e.g., phenylglyoxal, preferably hydroxyphenylglyoxal, more preferably p-hydroxyphenylglyoxal, such as, preferably, frans-phenylglyoxal, more preferably frans-hydroxyphenylglyoxal, and even more preferably frans-p-hydroxyphenylglyoxal.
  • adducts as set forth in formulas (Vila) or (VIIb), or further dehydration products thereof as explained may be preferably obtained by reacting peptides of a protein peptide mixture with a dicarbonyl compound or derivative thereof as defined in any of above embodiments "E4" and “E6", more preferably “E20” to "E22”, most preferably with nitromalondialdehyde (NMA).
  • NMA nitromalondialdehyde
  • guanidino moieties of peptides will form suitable adducts with the dicarbonyl compound or derivative thereof, such as, e.g., adducts as shown in formulas (Via) or (VIb), or (Vila) or (VIIb).
  • Such conditions may include, without limitation, concentrations of the reactants, relative molar excess of the reactants, solvent, pH, buffer system, temperature, reaction time, presence of catalysts, stopping or quenching the reaction, etc.
  • reaction conditions to obtain adducts between peptides and the dicarbonyl compound or derivative thereof may include any or all of the following:
  • - aqueous solvent preferably water
  • - pH between 7 and 11 preferably between 7 and 10, e.g., about 7, about 8, about 9 or about 10, more preferably between 8 and 10, and even more preferably about 9, e.g., 9; temperature between 5°C and 80 0 C, preferably between 10 0 C and 60 0 C, more preferably between 15°C and 40°C, even more preferably between 20°C and 30 0 C, yet more preferably between 20°C and 25°C or between 25°C and 30 0 C, such as, e.g., ambient temperature or about 30°C; - reaction in darkness;
  • peptides of the protein peptide mixture (PPM) that comprise a guanidino moiety may be specifically modified by reacting with peptidylarginine deiminase (EC 3.5.3.15; see, e.g., Fujisaki & Sugawara 1981. J Biochem (Tokyo) 89: 257-263).
  • Peptidylarginine deiminases catalyse enzymatic deimination of the guanidino group, most preferably of Arg residues, to an ureyl group (Arg ⁇ citrulline).
  • peptides of the protein peptide mixture (PPM) that comprise a guanidino moiety may be specifically modified by reacting with arginase (EC 3.5.3.1 ; see, e.g., Bach & Killip 1961. Biochim Biophys Acta 47: 336-343; Greenberg 1960, Arginase, in Boyer et a/., Eds., The Enzymes, 2nd edn., vol. 4, Academic Press, New York, pp. 257-267).
  • arginase EC 3.5.3.1 ; see, e.g., Bach & Killip 1961. Biochim Biophys Acta 47: 336-343; Greenberg 1960, Arginase, in Boyer et a/., Eds., The Enzymes, 2nd edn., vol. 4, Academic Press, New York, pp. 257-267).
  • Arginases catalyse enzymatic hydrolysis of the guanidino group, most preferably of Arg residues, to an amino group (Arg ⁇ ornithine).
  • the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to basic amino acid residue(s), more preferably C-terminally adjacent to Arg and/or hArg and/or Lys, even more preferably C-terminally adjacent to Arg and/or Lys.
  • Such fragmentation may be advantageously achieved using trypsin type endopeptidases, preferably trypsin. If cleavage after Lys is not desired, as in some of the above embodiments, Lys residues may be suitably blocked before said proteolysis, as described elsewhere in this specification.
  • peptides comprising a basic amino acid (preferably Arg, hArg or Lys, more preferably Arg or Lys) as their last residue will thus mainly be derived from the N-termini or internal portions of the starting protein(s), whereas peptides not having such basic amino acid as their last residue will mostly originate from and comprise the C-terminal ends of the starting protein(s).
  • the invention contemplates particularly advantageous manners to specifically alter peptides that include such basic amino acids as their last residues.
  • peptides of the protein peptide mixture that comprise a basic amino acid, preferably Arg, Lys or hArg, even more preferably Arg or Lys, as their last residue may be modified by reacting with carboxypeptidase B, carboxypeptidase U or carboxypeptidase D, more preferably with carboxypeptidase B, which catalyse the specific removal of said last basic residue.
  • PPM protein peptide mixture
  • peptides of the protein peptide mixture that comprise a basic amino acid, preferably Lys, as their last residue may be modified by reacting with carboxypeptidase N, which catalyses the specific removal of said last basic residue.
  • the above disclosed methods to specifically modify or remove the one or more amino acid residues X 1 , X 2 ... X ⁇ in or from peptides of the protein peptide mixture (PPM) will change one or more properties of the so altered peptides, and thereby allow to distinguish between the altered and unaltered peptides and to specifically isolate or enrich for the subset (S) of the unaltered peptides.
  • said modification or removal of the one or more amino acid residues X 1 , X 2 ... or X n may change one or more chemical or physical characteristics of the altered peptides, or may change their interaction with a capture agent.
  • said modification or residue removal may change one or more following peptide characteristics:
  • a modified amino acid residue may be a stronger or weaker acid or base than the original residue, causing a difference in protonation and charge at particular pH values.
  • a modifying agent may comprise one or more acidic (such as, e.g., -COOH, -SO 2 H or -SO 3 H, etc.) or basic (such as, e.g., -NH 2 , basic N-containing heteroaryl or heterocyclyl) groups which can confer a charge onto the modified amino acid.
  • acidic such as, e.g., -COOH, -SO 2 H or -SO 3 H, etc.
  • basic such as, e.g., -NH 2 , basic N-containing heteroaryl or heterocyclyl
  • non-polar such as, e.g., alkyl, aryl, aryl alkyl, etc.
  • the modifying agent may introduce into the altered peptides a group with a strong affinity for binding with a capture agent / ligand, e.g., a biotin-streptavidin binding or hapten-antibody binding or metal ion complexation, whereby said affinity is conferred to altered peptides;
  • the modifying agent may introduce into the altered peptides a bulky, voluminous entity which can confer an increase in molecular size to the altered peptides;
  • the modification or removal of said one or more residue types X 1 , X 2 ,... X ⁇ may change, e.g., increase or decrease, hydrophobicity of the so altered peptides.
  • reaction with the dicarbonyl compound or derivative thereof of formula (I) as defined herein may change, e.g., increase or decrease, hydrophobicity of the so altered peptides.
  • a modification which changes hydrophobicity / hydrophilicity of altered peptides comprising a guanidino group may result from reacting said peptides with a dicarbonyl compound or derivative thereof as defined in any of above embodiments "E 10", “E11 “; also preferably “E14", “E15”; more preferably “E16” to “E19”, such as, e.g., phenylglyoxal, preferably hydroxyphenylglyoxal, more preferably p-hydroxyphenylglyoxal, such as, preferably, trans- phenylglyoxal, more preferably frans-hydroxyphenylglyoxal, and even more preferably trans- p-hydroxyphenylglyoxal; or as defined in any of above embodiments "E4" and “E16", more preferably “E20” to "E22”, most preferably with nitromalondialdehyde (NMA).
  • NMA nitromalondialdehy
  • said removal may change, in particular decrease, the net charge of said peptide.
  • such a change in the properties of the altered peptides can affect the chromatographic behaviour of the altered peptides, thus allowing the use of chromatography to distinguish between the altered and unaltered peptides and specifically to isolate or enrich for the subset of unaltered peptides, i.e., peptides comprising C-terminal ends from the protein (P) or protein mixture (PM).
  • the changed properties of the altered peptides can affect the differential distribution of said altered peptides between the mobile and stationary phase in chromatography, as compared to the unaltered peptides.
  • the altered peptides have a different, e.g., shorter or longer, elution time than the unaltered peptides.
  • the type of chromatography suitable for separating the altered and unaltered peptides and isolating the subset (S) of unaltered peptides can be advantageously chosen based on the nature of the change in the properties of the altered peptides resulting from their modification using the agents of the invention. While such choice can be made by a skilled person armed with the present teachings, following examples are provided by further guidance and not limitation.
  • ion exchange chromatography can be advantageously used.
  • reverse phase chromatography e.g., RP-HPLC, or hydrophobic interaction chromatography may be advantageously used.
  • affinity chromatography e.g., immuno-affinity chromatography or immobilized metal affinity chromatography
  • said amino acid modification or removal changes the hydrophobicity, e.g., increases or decreases the hydrophobicity, of the altered peptides compared to the unaltered peptides, and reverse phase chromatography, preferably RP- HPLC is used for separating and isolating the subset (S) of unaltered peptides.
  • said amino acid modification or removal changes the hydrophilicity, e.g., increases or decreases the hydrophilicity, of the altered peptides compared to the unaltered peptides, and reverse phase chromatography, preferably RP- HPLC is used for separating and isolating the subset (S) of unaltered peptides.
  • electrophoresis including capillary electrophoresis, free flow electrophoresis, capillary zone electrophoresis, capillary electro-chromatography, capillary isoelectric focusing and affinity electrophoresis, as known in the art.
  • the step (b) of present method may comprise steps (ba) to (be): (ba) separating the protein peptide mixture (PPM) into fractions of peptides via chromatography, (bb) reacting at least one and preferably each peptide fraction from step (ba) with an agent capable of specifically modifying or removing said one or more amino acid residue types X 1 , X 2 ,...
  • step (ba) The chromatographic separation in step (ba), i.e., prior to reacting the peptide fractions with agents of the invention, is referred to herein as the "primary run” or the “primary chromatographic step” or the “primary chromatographic separation” or “run 1".
  • the "same type of chromatography" means that the first and second chromatographic separations are of the same type, in particular they are both configured to separate the peptides on the basis of the same property, e.g., a physical and/or chemical property of the peptides.
  • the primary and secondary runs can both separate the peptides on the basis of their hydrophobicity (e.g., the primary and secondary runs can both be hydrophobicity chromatography, preferably both be RP-chromatography, more preferably both be RP-HPLC chromatography); or the primary and secondary runs can both separate the peptides on the basis of their net charge (e.g., the primary and secondary runs can both be can both be ion exchange chromatography, preferably both be cation exchange chromatography or preferably both be anion exchange chromatography), or the primary and secondary runs can both separate the peptides on the basis of their bulk size (e.g., the primary and secondary runs can both be size exclusion chromatography), etc.
  • the primary and secondary runs can both be hydrophobicity chromatography, preferably both be RP-chromatography, more preferably both be RP-HPLC chromatography
  • the primary and secondary runs can both separate the peptides on the basis of their net charge (e.g., the primary and secondary
  • the agents of the invention change peptides such as to change the property or properties distinguished by said chromatography
  • the altered peptides will show a migration shift in the secondary run vis-a-vis the primary run, which allows for the separation of the altered and unaltered peptides and for isolation of the latter from the peptide fraction.
  • Separating the protein peptide mixture into fractions in the primary run and analysing each fraction separately (or, alternatively, analysing suitably pooled fractions, see below) in the secondary run advantageously ensures that the altered peptides from a particular fraction do not co-migrate with, and therefore can be distinguished from, unaltered peptides of one or more other fractions.
  • the method contemplates that each fraction from the primary run, having been subjected to reaction with the agents of the invention, can be separated in the secondary run into a fraction containing the unaltered peptides and a fraction containing the altered peptides.
  • elution time window of a fraction refers to the time window, within which at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% or at least 99% of peptides of the said fraction elute) for said fraction containing the unaltered peptides and for said fraction containing the altered peptides do not overlap. This allows for unambiguous isolation of the subset of unaltered peptides.
  • the time distance between the respective elution time windows for the fraction containing the unaltered peptides and for the fraction containing the altered peptides equals at least 0.25x, more preferably at least 0.5x, even more preferably at least 0.75x, yet more preferably at least 1x, and still more preferably at least 1.5x, 2 or more x of the duration of the elution time window for the fraction containing the unaltered peptides.
  • WO 02/077016 p. 12, I. 20 through p. 14, I. 5 and p. 18, I. 5-32 and Figure 1 ), incorporated herein by reference.
  • said time distance between the fraction containing the unaltered peptides and that containing the altered peptides will depend on a number of factors, such as, e.g., the agent used for peptide modification; the width of the fraction from the primary run subjected to the secondary run, the chromatographic conditions (e.g., type of stationary and mobile phase, buffers, etc.), and the like, which can be generally optimised by a skilled person armed with the present teachings.
  • the chromatographic conditions of the primary run and the secondary run are identical or, for a person skilled in the art, substantially similar.
  • substantially similar means for instance that small changes in flow and/or gradient and/or temperature and/or pressure and/or chromatographic beads and/or solvent composition, or the like, are tolerated between run 1 and run 2 as long as the chromatographic conditions lead to an elution of the altered peptides that is predictably distinct from the unaltered peptides and this for every fraction collected from run 1.
  • the reaction with the agents of the invention should be preferably effective in each of the so analysed peptide fractions from the primary run, such that in each fraction obtained from said primary run, the altered peptides will migrate distinctly from the unaltered peptides in the secondary chromatographic step.
  • each fraction from the primary run may be individually reacted with an agent of the invention and subsequently individually separated into altered and unaltered peptides in the secondary run.
  • the method may be streamlined if said reacting and/or the secondary chromatographic run is performed with pools of two or more fractions from the primary run.
  • each of said fractions yields, in the secondary run, fractions of altered peptides and unaltered peptides, the respective elution time windows of which should not overlap (see above).
  • the elution time window of any fraction obtained in the secondary run from a given primary run-fraction should not overlap with the elution time window of any fraction obtained in the secondary run from any other primary run-fraction in the pool.
  • buffers and or solvents used in both chromatographic steps are compatible with the conditions required to allow an efficient alteration of peptides with an agent of the invention.
  • the nature of the solvents and buffer in the primary run, the secondary run and the alteration step are identical or substantially similar.
  • said buffers and solvents are compatible with the conditions required to perform a mass spectrometric analysis.
  • reaction with dicarbonyl compound or derivative may require specific reaction conditions which are not compatible with the buffers used in the primary and/or secondary run, such conditions can be suitably changed before the alteration step and/or after the alteration step, the change being performed by methods described in the art such as for example an extraction, a lyophilisation and redisolving step, a precipitation and redisolving step, a dialysis against an appropriate buffer/solvent or even a fast reverse phase separation with a steep gradient, etc.
  • application of a pre-treatment step as mentioned herein above may require such changing in buffers or conditions before the first run.
  • Such changing in buffers or conditions may also be required before analysis of the peptides after the secondary run, etc.
  • the invention is also directed to a peptide sorter device that is able to carry out the methods of the invention, in particular the methods as above comprising primary and secondary chromatographic runs of the same type wherein the peptide fractions from run 1 are modified with an agent of the invention before separation in run 2.
  • a peptide sorter device that is able to carry out the methods of the invention, in particular the methods as above comprising primary and secondary chromatographic runs of the same type wherein the peptide fractions from run 1 are modified with an agent of the invention before separation in run 2.
  • peptide sorter refers to a device that efficiently separates unaltered peptides from the altered peptides.
  • identical or very similar chromatographic conditions are used in the peptide sorter in the two chromatographic runs such that during the secondary run the unaltered peptides stay at their original elution times and the altered peptides undergo a shift in the elution time.
  • a peptide sorter particularly refers to the pooling of fractions obtained after run 1 and the optimal organisation of the second chromatographic step to speed up the isolation of the unaltered peptides out of each of the run 1 fractions.
  • the invention relates to a system for sorting peptides comprising: a primary chromatographic column for separating a protein peptide (PPM) mixture into a plurality of fractions under a defined set of conditions, whereby each fraction is subsequently reacted with an agent of the invention as defined herein, and wherein the so- reacted fractions are pooled into a set of pooled fractions, each pooled fraction comprising at least two so-reacted fractions; and a set of secondary chromatographic columns comprising a first secondary chromatographic column for separating a first pooled fraction and at least a second secondary chromatographic column arranged in parallel with the first secondary chromatographic column for separating a second pooled fraction; wherein the set of secondary chromatography columns perform isolation of the unaltered peptides under substantially identical conditions as the defined set of conditions, whereby there is no elution overlap between i) the unaltered peptides from different fractions within one pool or between
  • the said defined set of conditions is configured to maximise the chromatography separation between peptides modified with the respective agent of the invention, from the unaltered peptides.
  • said set of conditions may be optimised for situations wherein the agent is a dicarbonyl compound or derivative thereof of formula (I) as defined herein which modifies a guanidino group, preferably of Arg and/or hArg, more preferably forms an adduct of any of formulas (Via) or (VIb), or (Vila) or (VIIb); or wherein the agent is peptidylarginine deiminase or arginase, which modify a guanidino group, preferably Arg; or wherein the agent removes basic last residue, such as carboxypeptidase B, U, D or N, preferably carboxypeptidase B.
  • the defined set of conditions involves hydrophobicity chromatography, preferably RP chromatography, more preferably RP-HPLC.
  • the invention also provides for methods as described above performed in conjunction with the peptide sorter devices as described herein.
  • peptide sorter devices which employ conditions that are specifically configured to achieve maximum separation of the unaltered peptides from peptides altered as taught herein, and uses thereof, further features and operation of said peptide sorters may be essentially as disclosed in WO 02/077016 (especially on p. 38, I. 15 through p. 54, I. 5, and p. 80, I. 23 through p. 88, I. 16 incorporated herein by reference).
  • the invention provides a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of:
  • the conditions are such that the acidic side chain moiety of the majority of acidic amino acids of the peptides of above step (i), and particularly the -COOH moiety of aspartic acid and glutamic acid residues, is not dissociated. This can reduce the confounding effect of such acidic moieties on the method.
  • the conditions in above step (ii) encompass pH between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.5, e.g., between 2.75 and 3.25, and still more preferably about 3, such as, e.g., 2.80, 2.85, 2.90, 2.95, 3.0, 3.05, 3.10, 3.15 or 3.20.
  • the fragmentation, e.g., proteolysis, in step (i) occurs preferentially at peptide bonds C-terminally adjacent to all types of basic amino acid residues.
  • Such fragmentation may be advantageously achieved using trypsin type endopeptidases, preferably trypsin, as taught elsewhere in this specification.
  • trypsin does not cleave after histidine. Therefore, a minor fraction of C-terminal peptides which contain a histidine residue may have an above zero net charge under the conditions of step (ii) and may not be isolated within the subset (S') of C- terminal peptides (this however does not diminish the usefulness of the method).
  • the charge of at least some or all of the basic amino acid types after which the proteolysis does not occur may be preferably neutralised by suitable modification, such as, e.g., by acetylation, such as, e.g., of lysines, or modification of His by diethylpyrocarbonate.
  • said subset (S') of peptides can be isolated using a technique capable of separating peptides on the basis of net charge, including, but not limited to, ion exchange chromatography, isoelectric focusing of peptides, and zwitterionic ion exchange chromatography (see, e.g., WO 00/27496).
  • said subset (S') of peptides can be isolated using ion exchange chromatography. In a preferred embodiment, said subset (S') of peptides can be isolated using cation exchange chromatography.
  • positively charged peptides will be more strongly retained by the stationary phase, whereas substantially uncharged peptides.
  • the C- terminal and optionally blocked N-terminal peptides of interest will elute faster and can be recovered from said eluate.
  • said subset (S') of peptides can be isolated using strong cation exchange (SCX) chromatography.
  • SCX strong cation exchange
  • the methods allow for the specific isolation of peptides comprising the C-terminal ends of proteins, e.g., proteins of complex protein mixtures such as biological samples.
  • Said peptides comprising C-terminal ends of proteins are highly representative of the originating proteins and as such serve as identification elements for their corresponding proteins.
  • the present invention therefore further provides a method to identify a subset of peptides isolated from a protein peptide mixture, in particular isolated peptides comprising C-terminal ends of proteins, and their corresponding proteins in a sample comprising proteins. Thereto the isolation of peptides, in particular peptides comprising C-terminal ends of proteins, according to any of the embodiments of the invention is further coupled to analysis of so isolated peptides.
  • peptide analysis of the isolated peptides is performed with a mass spectrometer.
  • said isolated peptides can also be further analysed and identified using other methods such as electrophoresis, activity measurement in assays, analysis with specific antibodies, Edman sequencing, etc.
  • an analysis or identification step can be carried out in different ways.
  • the isolated peptides e.g., eluting from a chromatographic column
  • isolated peptides are collected in fractions and fractions may or may not be manipulated before going into further analysis or identification.
  • An example of such manipulation consists of a concentration step, followed by spotting each concentrate on for instance, a MALDI-target for further analysis and identification.
  • the isolated peptides are analysed with high-throughput mass spectrometric techniques.
  • the information obtained is the mass of the isolated peptides.
  • FTMS Fourrier transform mass spectrometer
  • an internal calibration procedure e.g., O'Connor and Costello 2000, Anal Chem 72: 5881-5885
  • a yet further piece of information that can be used to identify isolated peptides is the Grand Average of hydrophaticity (GRAVY) of the peptides, reflected in the elution times during chromatography. Two or more peptides, with identical masses or with masses that fall within the error range of the mass measurements, can be distinguished by comparing their experimentally determined GRAVY with the in silico predicted GRAVY.
  • Another piece of information to identify isolated peptides may be the normalised elution time (NET), see, e.g., Norbeck et al. 2005 (J Am Soc Mass Spectrom 16: 1239-49).
  • mass spectrometer Any mass spectrometer may be used to analyze the isolated peptides.
  • mass spectrometers include the matrix-assisted laser desorption/ionization ("MALDI”) time- of-flight (“TOF”) mass spectrometer MS or MALDI-TOF-MS, available from PerSeptive Biosystems, Framingham, Massachusetts; the Ettan MALDI-TOF from AP Biotech and the Reflex III from Brucker-Daltonias, Bremen, Germany for use in post-source decay analysis; the Electrospray Ionization (ESI) ion trap mass spectrometer, available from Finnigan MAT, San Jose, California; the ESI quadrupole mass spectrometer, available from Finnigan MAT or the GSTAR Pulsar Hybrid LC/MS/MS system of Applied Biosystems Group, Foster City, California and a Fourrier transform mass spectrometer (FTMS) using an internal calibration procedure (O'Connor and Costello, 2000).
  • MALDI matrix-
  • Protein identification software used in the present invention to compare the experimental mass spectra of the peptides with a database of the peptide masses and the corresponding proteins are available in the art.
  • One such algorithm, ProFound uses a Bayesian algorithm to search protein or DNA database to identify the optimum match between the experimental data and the protein in the database.
  • ProFound may be accessed on the World-Wide Web at http//prowl. rockefeller.edu and http//www.proteometrics.com.
  • Profound accesses the non- redundant database (NR).
  • Peptide Search can be accessed at the EMBL website. See also, Chaurand P. et al. (1999) J. Am. Soc. Mass. Spectrom 10, 91 , Patterson S.
  • MS/MS spectra may also be analysed by MASCOT (available at http://www.matrixscience.com, Matrix Science Ltd. London).
  • isolated peptides are individually subjected to fragmentation in the mass spectrometer. In this way information about the mass of the peptide is further complemented with (partial) sequence data about the peptide. Comparing this combined information with information in peptide mass and peptide and protein sequence databases allows identification of the peptides.
  • fragmentation of the peptides is most conveniently done by collision induced dissociation (CID) and is generally referred to as MS2 or tandem mass spectrometry.
  • CID collision induced dissociation
  • MS2 collision induced dissociation
  • peptide ions can decay during their flight after being volatilized and ionized in a MALDI-TOF-MS. This process is called post-source-decay (PSD).
  • selected peptides are transferred directly or indirectly into the ion source of an electrospray mass spectrometer and then further fragmented in the MS/MS mode.
  • partial sequence information of the peptides is collected from the MSn fragmentation spectra (where it is understood that n is larger or equal to 2) and used for peptide identification in sequence databases described herein.
  • the present invention further provides a method for the identification of one or more proteins in a sample comprising proteins.
  • cleavage of a sample comprising proteins results in a protein peptide mixture comprising thousands of peptides and this overwhelms the resolving power of the currently available chromatographic systems and mass spectrometry systems.
  • a protein can be identified based on the identification of one or more of its constituting peptides.
  • the current invention provides methods to isolate specific subsets of peptides, and preferably peptides comprising C-terminal ends of the proteins in the sample.
  • This simplification of the original peptide mixture significantly reduces the co-elution of peptides in the secondary run and results in an efficient identification of the isolated peptides with analysers such as mass spectrometers or others. Since the isolated peptides, preferably C-terminal peptides of proteins, are most often unique identification elements for their corresponding parent proteins, identification of said peptides allows the identification of the proteins in the original sample comprising proteins. So, the task of identifying proteins in a sample comprising proteins by isolating and identifying one or more of their composite peptides becomes possible with the methods of the present invention.
  • the present invention therefore further provides a method to identify proteins in a sample comprising proteins, comprising isolating a subset peptides comprising C-terminal ends of said proteins using the methods of the invention, and identifying the isolated peptides of said subset and their corresponding proteins.
  • the isolated unaltered peptides can be separately identified in each of the unaltered peptide fractions obtained in the secondary runs.
  • the invention allows the identification of a whole range of proteins in a sample comprising proteins, varying for instance from high to low abundant, from acidic to basic, from small to large, from soluble to membrane proteins. Furthermore, the invention provides a method to identify proteins in a sample comprising proteins, starting from very small amounts of cells, e.g., perhaps as few as 50,000 human cells, as well as, obviously, from larger numbers of cells. In another embodiment, the present invention provides a method to determine the relative amount of one or more proteins in two or more samples comprising proteins. The method comprises the use of differentially isotopically labelled isolated peptides, preferably peptides comprising C-terminal ends of proteins. In this method, the two samples are treated in such a way that the peptides isolated from one sample contain one isotope and the peptides isolated from a second sample contain another isotope of the same element.
  • the method comprises the steps of (a) labelling the peptides present in a first sample with a first isotope; (b) labelling the peptides present in a second sample with a second isotope; (c) combining the protein peptide mixture of the first sample with the protein peptide mixture of the second sample; (d) isolating a subset peptides comprising C-terminal ends of the proteins using the methods of the invention, (e) performing mass spectrometric analysis of the isolated peptides; (f) calculating the relative amounts of the isolated peptides in each sample by comparing the peak heights of the identical but differential isotopically labelled isolated peptides; and (g) determining the identity of the isolated peptide and its corresponding protein.
  • the isolated unaltered peptides can be separately analysed in each of the unaltered peptide fractions obtained in the secondary runs. It is obvious that the same approach can be followed in combination with a pre-treatment step as mentioned here above. It is also obvious that, instead of mixing the peptides from both samples in step (c), peptides from a first and a second sample can be separately subjected to step (d) and become combined in step any of sub-steps of step (d) or in step (e).
  • the differential isotopic labelling of the peptides in a first and a second sample can be done in many different ways available in the art.
  • a key element is that a particular peptide originating from the same protein in a first and a second sample is identical, except for the presence of a different isotope in one or more amino acids of the peptide.
  • the isotope in a first sample will be the natural isotope, referring to the isotope that is predominantly present in nature, and the isotope in a second sample will be a less common isotope, hereinafter referred to as an uncommon isotope. Examples of pairs of natural and uncommon isotopes are H and D, 12 C and 13 C, 14 N and 15 N. Peptides labelled with the heaviest isotope of an isotopic pair are herein also referred to as heavy peptides.
  • Peptides labelled with the lightest isotope of an isotope pair are herein also referred to as light peptides.
  • a peptide labelled with H is called the light peptide
  • D is called the heavy peptide.
  • Peptides labelled with a natural isotope and its counterparts labelled with an uncommon isotope are chemically very similar, separate chromatographically in the same manner and also ionize in the same way.
  • an analyser such as a mass spectrometer
  • they will segregate into the light and the heavy peptide.
  • the heavy peptide has a slightly higher mass due to the higher weight of the incorporated, chosen isotopic label.
  • the results of the mass spectrometric analysis of isolated peptides will be a plurality of pairs of closely spaced twin peaks, each twin peak representing a heavy and a light peptide.
  • Each of the heavy peptides is originating from the sample labelled with the heavy isotope; each of the light peptides is originating from the sample labelled with the light isotope.
  • the ratios (relative abundance) of the peak intensities of the heavy and the light peak in each pair are then measured. These ratios give a measure of the relative amount (differential occurrence) of that peptide (and its corresponding protein) in each sample.
  • the peak intensities can be calculated in a conventional manner (e.g. by calculating the peak height or peak surface).
  • the isolated peptides can also be identified allowing the identification of proteins in the samples. If a protein is present in one sample but not in another, the isolated peptide (corresponding with this protein) will be detected as one peak which can either contain the heavy or light isotope. However, in some cases it can be difficult to determine which sample generated the single peak observed during mass spectrometric analysis of the combined sample. This problem can be solved by double labelling the first sample, either before or after the proteolytic cleavage, with two different isotopes or with two different numbers of heavy isotopes. Examples of labelling agents are acylating agents.
  • Incorporation of the natural and/or uncommon isotope in peptides can be obtained in multiple ways.
  • proteins are labelled in the cells.
  • Cells for a first sample are for instance grown in media supplemented with an amino acid containing the natural isotope and cells for a second sample are grown in media supplemented with an amino acid containing the uncommon isotope.
  • This method is well known in the art, e.g., SILAC (Stable isotope labelling with amino acids in cell culture), e.g., as in Ong et al. 2002 (MoI Cell Proteomics 1 (5): 376-86 and further developments thereof.
  • Mixing of the proteins/peptides from both samples can be done at different time points.
  • the mixing can be done at the level of the sample (e.g. mixing an equal number of cells from both samples) or proteins can be isolated separately from sample 1 and sample 2 and subsequently mixed or proteins from sample 1 are digested into peptides and proteins from sample 2 are digested into peptides and the peptides originating from sample 1 and sample 2 are mixed, etc.
  • Incorporation of the differential isotopes can further be obtained with multiple labelling procedures based on known chemical reactions that can be carried out at the protein or the peptide level.
  • proteins can be changed by the guadinylation reaction with O- methylisourea, converting NH 2 -groups into guanidinium groups, thus generating homoarginine at each previous lysine position.
  • Proteins from a first sample can be reacted with a reagent with the natural isotopes and proteins from a second sample can be reacted with a reagent with an uncommon isotope.
  • Peptides could also be changed by Shiffs-base formation with deuterated acetaldehyde followed by reduction with normal or deuterated sodiumborohydride.
  • the samples may be differentially labelled using the iTRAQ technology with isobaric reagents that tag amine groups, essentially as taught in Ross et al. 2004 (MoI Cell Proteomics 3(12): 1154-69). These tags are preferably using in conjunction with tandem MS mode (in which peptides are isolated and fragmented) in which each tag generates a unique reporter ion.
  • acetyl N-hydroxysuccinimide ANHS
  • one sample can be acetylated with normal ANHS whereas a second sample can be acylated with CD 3 CO-NHS.
  • the ⁇ -NH 2 group of all lysines is in this way derivatized in addition to the amino-terminus of the peptide.
  • Still other labelling methods are for example acetic anhydride which can be used to acetylate hydroxyl groups and trimethylchlorosilane which can be used for less specific labelling of functional groups including hydroxyl groups and amines.
  • the primary amino acids are labelled with chemical groups allowing differentiation between the heavy and the light peptides by 5 amu, by 6 amu, by 7 amu, by 8 amu or even by larger mass difference.
  • the differential isotopic labelling is carried out at the carboxy-terminal end of the peptides, allowing the differentiation between the heavy and light variants by more than 5 amu, 6 amu, 7 amu, 8 amu or even larger mass differences. Since the methods of the present invention do not require any prior knowledge of the type of proteins that may be present in the samples, they can be used to determine the relative amounts of both known and unknown proteins which are present in the samples examined.
  • the methods provided in the present invention to determine relative amounts of at least one protein in at least two samples can be broadly applied to compare protein levels in for instance cells, tissues, or biological fluids (e.g. nipple aspiration fluid, saliva, sperm, cerebrospinal fluid, urine, serum, plasma, synovial fluid), organs, and/or complete organisms.
  • a comparison includes evaluating subcellular fractions, cells, tissues, fluids, organs, and/or complete organisms which are, for example, diseased and non-diseased, stressed and non-stressed, drug-treated and non drug-treated, benign and malignant, adherent and nonadherent, infected and uninfected, transformed and untransformed.
  • the method also allows the comparison of protein levels in subcellular fractions, cells, tissues, fluids, organisms, complete organisms exposed to different stimuli or in different stages of development or in conditions where one or more genes are silenced or over-expressed or in conditions where one or more genes have been knocked-out.
  • the methods described herein can also be employed in diagnostic assays for the detection of the presence, the absence or a variation in expression level of one or more protein markers or a specific set of proteins indicative of a disease state (e.g., such as cancer, neurodegenerative disease, inflammation, cardiovascular diseases, viral infections, bacterial infections, fungal infections or any other disease).
  • Specific applications include the identification of target proteins which are present in metastatic and invasive cancers, the differential expression of proteins in transgenic mice, the identification of proteins that are up- or down-regulated in diseased tissues, the identification of intracellular changes in cells with physiological changes such as metabolic shift, the identification of biomarkers in cancers, the identification of signalling pathways.
  • the present invention further provides a method to quantitate the amount of one or more proteins in a single sample comprising proteins.
  • the method comprises the steps of: (a) preparing a protein peptide mixture; (b) adding to the mixture a known amount of a synthetic reference peptide labelled with an isotope distinguishable form the reference peptide isotope; (c) isolating a subset peptides comprising C-terminal ends of the proteins using the methods of the invention; (d) performing mass spectrometric analysis of the isolated peptides; and (e) determining the amount of the protein present in the sample by comparing the peak heights of the synthetic reference peptide to the reference peptide.
  • the isolated unaltered peptides can be separately analysed in each of the unaltered peptide fractions obtained in the secondary runs. It is obvious that the same methods can be followed in combination with a pre-treatment step as mentioned herein above.
  • Reference peptides are peptides whose sequence and/or mass is sufficient to unambiguously identify its parent protein. By preference, peptide synthesis of equivalents of reference peptides is easy.
  • a reference peptide as used herein is the native peptide as observed in the protein it represents, while a synthetic reference peptide as used herein is a synthetic counterpart of the same peptide.
  • Such synthetic reference peptide is conveniently produced via peptide synthesis but can also be produced recombinantly.
  • Peptide synthesis can for instance be performed with a multiple peptide synthesizer. Recombinant production can be obtained with a multitude of vectors and hosts as widely available in the art.
  • Reference peptides by preference ionize well in mass spectrometry.
  • a non-limiting example of a well ionizing reference peptide is a reference peptide which contains an arginine.
  • a reference peptide is also easy to isolate as above. In the latter preferred embodiment the reference peptide is simultaneously also an isolated peptide.
  • a reference peptide and its synthetic reference peptide counterpart are chemically very similar, separate chromatographically in the same manner and also ionize in the same way.
  • the reference peptide and its synthetic reference peptide counterpart are however differentially isotopically labelled.
  • the reference peptide and its synthetic reference peptide counterpart are altered in a similar way and are co-isolated, e.g., in the same fraction of the primary and the secondary run and in an eventual ternary run.
  • an analyzer such as a mass spectrometer, they will segregate into the light and heavy peptide.
  • the heavy peptide has a slightly higher mass due to the higher weight of the incorporated chosen heavy isotope. Because of this very small difference in mass between a reference peptide and its synthetic reference peptide, both peptides will appear as a recognizable closely spaced twin peak in a mass spectrometric analysis. The ratio between the peak heights or peak intensities can be calculated and these determine the ratio between the amount of reference peptide versus the amount of synthetic reference peptide. Since a known absolute amount of synthetic reference peptide is added to the protein peptide mixture, the amount of reference peptide can be easily calculated and the amount of the corresponding protein in the sample comprising proteins can be calculated.
  • any of the here above mentioned methods to differentially isotopically label a peptide with an uncommon isotope can be applied (in vivo labelling, enzymatic labelling, chemical labelling, etc.).
  • in vivo labelling is to incorporate the commercially available deuterated methionine CH3-SCD 2 -CD 2 -CH-(NH 2 )-COOH, adding 4 amu's to the total peptide mass.
  • synthetic reference peptides could also contain deuterated arginine H 2 NC-(NH)- NH-(CD 2 ) 3 -CD-(NH 2 )-COOH) which would add 7 amu's to the total peptide mass. It should be clear to one of skill in the art that every amino acid of which deuterated or 15 N or 13 C forms exist can be considered in this protocol. Many other methods can be used.
  • the quantitative analysis of at least one protein in one sample comprising proteins comprises the steps of: a) preparing a protein peptide mixture wherein the peptides carry an uncommon isotope (e.g. a heavy isotope); b) adding to the protein peptide mixture a known amount of a synthetic reference peptide carrying natural isotopes (e.g.
  • a light isotope (c) isolating a subset peptides comprising C-terminal ends of said proteins using the methods of the invention; (d) determination by mass spectrometry of the ratio between the peaks heights of the reference peptide versus the synthetic reference peptides and (e) calculation of the amount of protein, represented by the reference peptide, in the sample comprising proteins.
  • the peptide isolation involves the primary and secondary chromatographic runs in between the modification with the agent of the invention, the isolated unaltered peptides can be separately analysed in each of the unaltered peptide fractions obtained in the secondary runs.
  • each synthetic reference peptides is added in an amount equimolar to the expected amount of its reference peptide counterpart.
  • the methods provided in the present invention to quantify at least one protein in a sample comprising proteins can be broadly applied to quantify proteins of different interest.
  • diagnostic assays can be developed by which the level of one or more proteins is determined in a sample by making use of the present invention.
  • Further description of applications for methods to isolate subsets of peptides from protein peptide mixtures are discussed in WO 02/077016 (especially p. 21 , I. 22 through p. 37, I. 21 thereof, herein incorporated by reference) and skilled person will be able to extend their applicability to peptide subsets obtainable in the present invention.
  • the methods of the invention can be used for the proteomics study of protein processing ("degradomics").
  • protein processing or degradation in vivo or in cell culture may produce protein fragments displaying novel C- terminal ends.
  • the methods of the present invention which can in general enrich for and isolate peptides comprising C-terminal ends of proteins, can be advantageously used to follow the appearance of novel C-terminal end peptides which can be identified and can be indicative of novel proteolytic processing, and/or follow the changes in absolute or relative quantity of known C-terminal end peptides, representative of known cleavage events.
  • Such methods may advantageously complement degradomics analysis based on the study of novel N-terminal peptides.
  • the methods of the invention can be used for the proteomics study of proteins from one species on the background of proteins from another species ("xenoproteomics").
  • proteomics peptides comprising C-terminal ends of proteins of one species may be specifically recognised and identified vis-a-vis peptides comprising C-terminal ends of proteins from another species.
  • this method may be used to specifically identify human proteins in body fluids of mice xenografted with human tissues, e.g., primary human tumours, so as to find potential biomarkers.
  • the invention is further illustrated with examples that are not to be considered limiting.
  • the second group of aspects of the invention is concerned around a method for protein identification and optionally quantification from a protein mixture comprising the steps: (a) fragmenting a mixture of proteins (PM) to obtain a protein peptide mixture (PPM);
  • protein mixture (PM) may be subjected to chemical and/or enzymatic pre- treatment(s) such as to desirably block or alter selected moieties before and/or following fragmentation.
  • the mixture of proteins (PM) or the protein peptide mixture (PPM) may be reacted with one or more modifying reagents, simultaneously or sequentially in any suitable order, which reagents may preferably fall into the following classes: modifiers of primary amines, particularly modifiers of ⁇ -NH 2 groups and/or Lys ⁇ -NH 2 groups; or modifiers of cysteine residues.
  • the sample may optionally be purified using known techniques, such as solvent evaporation, washing, filtration, chromatographic techniques, etc.
  • Suitable blocking reagents as well as methods and conditions for attaching and detaching protecting groups will be clear to the skilled person and are generally described in standard handbooks of organic chemistry, such as “Protecting Groups", P. Kocienski, Thieme Medical Publishers, 2000; Greene and Wuts, "Protective groups in organic synthesis", 3rd edition, Wiley and Sons, 1999; incorporated herein by reference in its entirety.
  • Cys -SH groups in the protein mixture (PM) or protein peptide mixture (PPM) are protected to avoid their reactivity, in particular oxidation, throughout the method.
  • the protein mixture (PM) or protein peptide mixture (PPM) is first treated with a reducing agent known per se, such as, e.g., ⁇ -mercaptoethanol, dithiothreitol (DTT), dithioerythritol (DTE) or suitable trialkylphosphine inter alia tris(2-carboxyethyl)phosphine (TCEP), to quantitatively reduce any oxidised -SH groups, e.g., disulphide bridges.
  • DTT dithiothreitol
  • DTE dithioerythritol
  • TCEP tris(2-carboxyethyl)phosphine
  • - SH groups may be converted to acetamide derivatives by treatment with iodoacetamide in denaturing buffers (e.g., guanidium ion- or urea-containing buffers).
  • denaturing buffers e.g., guanidium ion- or urea-containing buffers.
  • Other blocking reagents such as N-substituted maleimides (e.g., N-ethylmaleimide), acrylamide, N-substituted acrylamide or 2-vinylpyridine, may alternatively be used.
  • primary amino groups (“primary amino” alone or in combination refers to a group of formula -NH 2 , optionally in any dissociation or protonation state such as -NH 3 + ), such as particularly ⁇ -NH 2 groups and/or side chain primary amino groups including Lys ⁇ - NH 2 groups in the protein mixture (PM) or protein peptide mixture (PPM) may need to be modified to block their reactivity and/or to neutralise or otherwise alter the charge thereof, using a suitable reagent that reacts selectively with the desired primary amino groups and presents a non-reactive substituent for subsequent conditions.
  • the reagent may be generally substituted once or twice on each so-modified primary amine (i.e., -NH 2 gives -NHZ or -NZ 2 , where Z is the substituent introduced by said reagent).
  • primary amines may be protected by acylation, more preferably acetylation, using reagents known per se, such as, e.g., using acetyl N- hydroxysuccinimide.
  • acylation of primary amino groups can avoid protonation of so-modified groups under conditions of the present methods, thereby advantageously neutralising the charge of so-modified amino groups.
  • Other suitable NH 2 -modifying reagents have been extensively described in the art, for example, in Regnier et al. 2006 (Proteomics 6: 3968- 3979).
  • the acyl moiety may be occasionally also introduced on the -OH group of Ser and/or Thr.
  • ester bonds are preferably subsequently broken by alkali hydrolysis at conditions that do not effect the acylation of the -NH 2 groups.
  • the blocking step performed on the protein mixture (PM) should block at least the N-terminal ⁇ -NH 2 groups thereof, such as to introduce a charge difference between the N-terminus of the C ⁇ -blocked N-terminal peptides and the free ⁇ -NH 2 -containing N- termini of internal and C-terminal peptides as generated during cleavage.
  • said blocking step may also protect any side chain primary amino groups such as Lys ⁇ -NH 2 groups in the protein mixture (PM), which may allow to isolate as well the C-terminal peptides containing a Lys, thereby increasing the representation of the parent proteins.
  • Blocking reagents such acetyl N-hydroxysuccinimide are capable of blocking both ⁇ -NH 2 and side chain amino groups.
  • a protein peptide mixture may be obtained by fragmentation of a mixture of proteins, such as, e.g., by fragmentation of all or a fraction of proteins present in and/or isolated from a biological sample after the sample has been removed from biological source.
  • the invention in particular analyses N- and/or C-terminal peptides of proteins.
  • N-terminal peptides or C-terminal peptides generated by fragmentation from individual molecules of a given protein have the same length, i.e., that fragmentation generating such N-terminal peptides (or C-terminal peptides) occurs at the same peptide bond in substantially all individual molecules of said protein.
  • a peptide bond adjacent to a given amino acid residue may be the N- terminally adjacent peptide bond, or the C-terminally adjacent peptide bond.
  • a protein mixture (PM) will be fragmented at substantially all recited peptide bonds.
  • the fragmentation would occur substantially quantitatively at peptide bonds N- terminally or C-terminally adjacent to amino acid residues of the one or more types X 1 ... X ⁇ .
  • X n adjacent to which fragmentation is contemplated herein may be selected from any amino acid residues, including but not limited to amino acids found in naturally occurring proteins, amino acids carrying a co- or post- translational modification, amino acids including a non-natural isotope, or amino acids further chemically and/or enzymatically altered prior to the fragmentation, etc.
  • a suitable frequency of cleavage may be preferably achieved when the fragmentation takes place adjacent to one or more of the 20 common amino acid residue types found in natural proteins and/or adjacent to one or more of residue types obtained from any of the 20 common amino acid residue types by suitable modification of the starting proteins. Accordingly, in a preferred embodiment, the mixture of proteins (PM) is fragmented preferentially at peptide bonds adjacent to one or more amino acid residue types X 1 ...
  • X ⁇ chosen from the group consisting of: GIy, Pro, Ala, VaI, Leu, lie, Met, Cys, Phe, Tyr, Trp, His, Lys, Arg, GIn, Asn, GIu, Asp, Ser and Thr; optionally including a co- or post-translational modification, chemically and/or enzymatically altered prior to the fragmentation, or including a non-natural isotope, etc.
  • PM protein mixture
  • N-terminal and/or C-terminal peptides are isolated from the resulting protein peptide mixture (PPM).
  • N-terminal and/or C-terminal peptides from the protein peptide mixture (PPM) requires that the majority of or preferably substantially all N-terminal and/or C-terminal peptides are distinct from the majority of or preferably substantially all remaining peptides of the protein peptide mixture (PPM) with respect to one or more physical and/or chemical properties.
  • N-terminal and/or C-terminal peptides are isolated from the remaining peptides of a protein peptide mixture (PPM) particularly on the basis of dissimilar net charge.
  • Techniques capable of separating peptides on the basis of net charge include without limitation ion exchange chromatography and zwitterionic ion exchange chromatography (see, e.g., WO 00/27496), chromatofocusing and various electrophoretic techniques such as inter alia isoelectric focusing.
  • Preferred techniques for use herein encompass ion exchange chromatography, including cation or anion exchange chromatography, preferably including strong cation exchange (SCX) or strong anion exchange (SAX) chromatography.
  • SCX strong cation exchange
  • SAX strong anion exchange
  • Non-limiting embodiments "Ea” to “Ee” contemplate preferred manners of endowing majority of or substantially all N-terminal and/or C-terminal peptides with a net charge distinct from the majority of or substantially all remaining peptides of the protein peptide mixture (PPM).
  • PPM protein peptide mixture
  • the protein mixture (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to one or more specific amino acid residue types X 1 ...X ⁇ that are basic.
  • most C-terminal peptides will not include a basic amino acid (unless a basic amino acid was the actual C-terminal residue of the respective protein), whereas most or all N-terminal and internal peptides will comprise a basic amino acid as their last residue. Therefore, under conditions where the side chain moiety of said basic amino acids is protonated, the net charge of C-terminal peptides will in general be lower than the net charge of N-terminal and internal peptides. This general difference between the net charge of C-terminal peptides vis-a-vis the remaining peptides allows for isolating or enriching the C-terminal peptides from the protein peptide mixture (PPM).
  • C-terminal peptides will in general have about zero net charge, while N-terminal and internal peptides will in general have net charge of about +1.
  • the conditions may also be such that the acidic side chain moiety of the majority of or substantially all acidic amino acids present in the peptides (particularly the -COOH moiety of Asp and GIu) is not dissociated. This can reduce the confounding effect of such acidic moieties on the method.
  • Conditions as above may preferably encompass pH of about 4.0 or lower, preferably of about 3.0 or lower, such as, e.g., between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.5, e.g., between 2.75 and 3.25, and still more preferably about 3, such as, e.g., 2.80, 2.85, 2.90, 2.95, 3.0, 3.05, 3.10, 3.15 or 3.20.
  • C-terminal peptides can be preferably isolated using cation exchange chromatography.
  • C-terminal peptides can be isolated using SCX.
  • the ⁇ -NH 2 group of proteins in the protein mixture (PM) may be blocked prior to fragmentation, e.g., acylated preferably acetylated, to prevent protonation thereof. Consequently, N-terminal peptides produced by fragmentation C-terminally adjacent to basic amino acid residues will, under the above conditions, generally display net charge similar to C-terminal peptides (since the blocked ⁇ -NH 2 group of N-terminal peptides is not charged). Co-isolation of N- and C-terminal peptides from the protein peptide mixture (PPM) is thus possible thereby potentially increasing the confidence of subsequent protein identification.
  • PPM protein peptide mixture
  • the charge of some or all basic amino acids after which the fragmentation does not occur may be preferably neutralised by suitable side chain modification - such as, e.g., by acetylation of Lys, by modification of His by diethylpyrocarbonate, or by modification of Arg by phenylglyoxal - such that the presence of said amino acids in the peptides does not alter the overall net charge of the latter.
  • suitable side chain modification - such as, e.g., by acetylation of Lys, by modification of His by diethylpyrocarbonate, or by modification of Arg by phenylglyoxal - such that the presence of said amino acids in the peptides does not alter the overall net charge of the latter.
  • a protein mixture is fragmented by trypsin or a trypsin-like protease to yield a protein peptide mixture (PPM).
  • PPM protein peptide mixture
  • So isolated peptides mostly represent C- terminal peptides derived from the parent proteins.
  • the peptides are isolated using cation exchange chromatography, more preferably SCX.
  • the peptides are isolated at pH as recited above, particularly at pH between 2.5 and 4.0, more preferably about 3.
  • Ec at least the ⁇ -NH 2 groups, and possibly also the side chain primary amino groups particularly including the ⁇ -NH 2 groups of Lys, in the proteins of a protein mixture (PM) are blocked, preferably acylated, more preferably acetylated, and the so-modified protein mixture (PM) is fragmented by trypsin or a trypsin-like protease to yield a protein peptide mixture (PPM).
  • PPM protein peptide mixture
  • PPM protein peptide mixture
  • So isolated peptides mostly represent N- and C-terminal peptides derived from the starting proteins.
  • the peptides are isolated using cation exchange chromatography, more preferably SCX.
  • the peptides are isolated at pH as described above, particularly at pH between 2.5 and 4.0, more preferably about 3.
  • the protein mixture (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to one or more specific amino acid residue types X 1 ...X n that are acidic.
  • the term "acidic amino acid” generally refers to amino acids, particularly ⁇ -L- amino acids, wherein the dissociation constant pK A of their side chain is ⁇ 5, preferably ⁇ 4 or lower.
  • Particular acidic amino acids include Asp and GIu, which comprise side chain carboxyl moiety.
  • the conditions may also be such that the basic side chain moiety of the majority of or substantially all basic amino acids present in the peptides (particularly Lys, Arg and His) is not protonated; this can reduce the confounding effect of such basic moieties on the method.
  • This general difference between the net charge of C-terminal peptides vis-a-vis the remaining peptides allows for isolating or enriching the C-terminal peptides from the protein peptide mixture (PPM), e.g., by SAX.
  • PPM protein peptide mixture
  • the ⁇ -NH 2 group and/or the ⁇ -COOH group of proteins in the protein mixture (PM) may be modified to introduce thereon a moiety having one or more positive charges (e.g., a basic moiety) or one or more negative charges (e.g., an acidic moiety such as sulphonate moiety) wherein said charges are present at least under conditions of subsequent peptide separation, e.g., using ion exchange chromatography.
  • positive charges e.g., a basic moiety
  • negative charges e.g., an acidic moiety such as sulphonate moiety
  • the newly generated free ⁇ -NH 2 groups and/or ⁇ -COOH groups may be optionally and preferably modified to introduce thereon a moiety having one or more charges opposite to those added on the ⁇ -NH 2 group and/or ⁇ -COOH of the N-terminal and/or C-terminal peptides.
  • introduction of charged moieties onto N-terminal and/or C-terminal peptides may ensure better ion fragmentation of so-charged peptide species (e.g., as compared to acetylated N- terminal peptides) during MS analysis.
  • the charged moiety introduced onto N-terminal and/or C-terminal peptides may be a weak base or a weak acid moiety.
  • such moieties may be endowed with charge when required (e.g., to enable ion exchange-based peptide sorting as above) but may be kept uncharged if presence of such charge would be undesired in other separation steps.
  • Inclusion of a weak base moiety can be particularly advantageous since it can greatly facilitate ion fragmentation of such peptides during MS.
  • the introduced charged moiety may be a strong base or a strong acid moiety, which maintains its charge substantially irrespective of the solvent pH.
  • N-terminal and C-terminal peptides may be enriched on the basis of their distinct net charge as described in embodiments "Eaa” or "Ec".
  • N-terminal peptides may be further isolated by removing peptides containing a free ⁇ -NH 2 group (which mostly include C-terminal peptides and non-tryptic peptides) using affinity separation or chromatography with capture agents having affinity to primary amines.
  • Capture agents having strong affinity for primary amino groups include without limitation crown ethers, such as, e.g., 18- crown-6 ether or derivatives thereof.
  • the isolated N-terminal and/or C-terminal peptides are subsequently separated into fractions of peptides by a two- or more-dimensional (multidimensional) separation process, as described in the Summary section.
  • one or more or all separation steps of the multidimensional separation process may be by chromatography (i.e., multidimensional chromatography).
  • the separation process may be multidimensional chromatography, such as, e.g., 4D-chromatography, 3D-chromatography or two-dimensional chromatography, preferably orthogonal chromatography.
  • chromatography preferably employs liquid mobile phase.
  • the chromatography may be columnar, i.e., wherein the stationary phase is deposited or packed in a column.
  • the chromatography is HPLC. Columns and conditions for performing HPLC separation are generally known to the skilled person, and described in, e.g., Practical HPLC Methodology and Applications, Bidlingmeyer, B. A., John Wiley & Sons Inc., 1993.
  • Stationary phase for use in chromatography may commonly comprise solid support functionalised with one or more moiety types intended for interaction with analytes and/or for allowing formation of a liquid stationary phase film on the support.
  • solid supports for use in separation methods including chromatography are generally known in the art, being solid materials that are structurally stable and chemically inert under conditions of separation and which exhibit low or no non-specific interactions with analytes.
  • Solid supports should allow for the immobilisation thereon of one or more functionalising moieties. Methods for immobilisation of moieties of interest to solid supports, and optionally the choice of spacers or linkers therefore, are well known in the field; see, e.g., Immobilized Affinity Ligand Techniques, Hermanson, G. T. et al, Academic Press, INC, 1992; Combinatorial Chemistry, Eds: Bannwarth, Willi, Hinzen, Berthold, Wiley-VCH.
  • Solid supports may be made from organic or inorganic materials or hybrid organic/inorganic materials, and may be polymer-based materials.
  • solid supports include ones prepared from a native polymer, such as cross-linked carbohydrate material, include, e.g., agarose, agar, cellulose, dextran, chitosan, konjac, carrageenan, gellan, alginate, etc.; or ones prepared from a synthetic polymer or copolymer, such as cross-linked synthetic polymers, e.g., styrene or styrene derivatives, divinylbenzene, acrylamides, acrylate esters, methacrylate esters, vinyl esters, vinyl amides, etc.; or solid supports prepared from an inorganic polymer, such as silica, which is particularly suitable for inter alia HPLC.
  • Inorganic porous and non-porous supports are well known in this field, some of which are commercially available.
  • matrix materials include, but are not limited to, those based on silica, polystyrene, POROS®, sepharose®, sepharoporeTM, and other variants thereof.
  • a skilled person can choose suitable solid support material based on the type of separation, expected unwanted non-specific interactions, capacity, loadability and flow characteristics, etc.
  • a solid support can be in the form of, e.g., beads, pellets, resin, small particles, a membrane, a frit, a sintered cake, pillars in microfabricated structures or a monolith or any other form desirable for use.
  • the solid support particles can have, for example, a spherical shape, a regular shape or an irregular shape.
  • suitable particle sizes may be in the diameter range of about 1-500 ⁇ m, such as about 2-200 ⁇ m or about 5-100 ⁇ m, e.g., about 5-50 ⁇ m.
  • Size of particles for use in HPLC may preferably be in the diameter range of about 1-10 ⁇ m preferably about 5 ⁇ m.
  • solid supports may be comprised in a chromatography column as a chromatography matrix, in a phase extraction cartridge (SPE), in a magnetic bead, in a centrifugable or filterable bead or in any other known format suitable for separations.
  • SPE phase extraction cartridge
  • one or more chromatographic separation steps may involve reversed phase (RP) liquid chromatography, preferably RP-HPLC.
  • RP reversed phase
  • Exemplary stationary phases for RP chromatography may include appropriate solid supports (e.g., porous or non-porous silica) functionalised with moieties such as listed in the Summary section.
  • Commercially available chromatography columns functionalised with moieties suitable for RP-HPLC may be used in the present method, such as without limitation ones summarised in Table 3:
  • the loading mobile phase is aqueous in nature comprising a (low) percentage of organic modifier (e.g., ACN or methanol).
  • organic modifier e.g., ACN or methanol
  • the peptides are separated using a solution comprising constant or gradually increasing (gradient) percentages of a water miscible solvent with hydrophobic properties such as acetonitrile (ACN), an alcohol (e.g. methanol, ethanol) or other solvents known in the art of reversed phase separation.
  • ACN acetonitrile
  • ACN acetonitrile
  • alcohol e.g. methanol, ethanol
  • one or more chromatographic separation steps may involve hydrophilic interaction chromatography (HILIC), such as ZIC-HILIC.
  • HILIC hydrophilic interaction chromatography
  • Exemplary stationary phases for HILIC chromatography may include appropriate solid supports (e.g., porous or non-porous silica) functionalised with moieties such as listed in the Summary
  • Sequant AB (Umea, Sweden), Tosoh Bioscience (Tessenderlo, Belgium), PoIyLC Inc. (Columbia, Maryland)
  • the loading mobile phase is hydrophobic in nature (e.g., ACN) comprising a (low) percentage of water in order to generate hydrophilic stationary phase.
  • ACN hydrophobic in nature
  • the peptides are separated using a solution comprising constant or gradually increasing (gradient) percentages of a water or buffer with hydrophilic properties.
  • the inventors have realised that the high percentage of hydrophobic solvent (e.g., ACN) needed to load HILIC columns may precipitate some peptides.
  • a short RPLC column e.g., a short C18 column.
  • the length of said short column may be, e.g., less than about 5 cm, e.g., less than about 4 cm, more preferably less than about 3 cm, e.g., less than about 2.5 cm, even more preferably less than about 2 cm, such as, e.g., between about 0.5 cm and about 2 cm, more preferably between about 1 cm and about 2 cm.
  • Peptides are first loaded onto the RPLC column in a predominantly aqueous solution (e.g., between about 80% and about 100% aqueous solution, preferably between about 90% and 100% and more preferably about 100% aqueous solution) whereby the peptides remain bound to the RPLC stationary phase.
  • the peptides are eluted from the RPLC column onto the in-line downstream HILIC column using a highly hydrophobic solvent (e.g., at least 70% hydrophobic, more preferably at least 80% or at least 85% hydrophobic solvent, such as, e.g., about 85% ACN).
  • a highly hydrophobic solvent e.g., at least 70% hydrophobic, more preferably at least 80% or at least 85% hydrophobic solvent, such as, e.g., about 85% ACN.
  • Table 5 lists several preferred but non-limiting setups of two-dimensional chromatography preferred as the multidimensional separation process of the invention, which display particular orthogonal properties and effective resolution of the N-terminal and/or C-terminal peptides:
  • RP-HPLC at low pH may be preferably used as 2 nd (or ultimate) dimension due to its advantageous compatibility with downstream MS analysis.
  • HILIC may be used as 2 nd (or ultimate) dimension due to MS compatibility.
  • the isolated N-terminal and/or C-terminal peptides are subsequently separated into fractions of peptides by a 1 D long-column chromatography separation, preferably liquid chromatography, more preferably HPLC.
  • 1 D long-column chromatography may use stationary phases, mobile phases, solid supports, functionalising moieties, etc. as described above in relation to multidimensional chromatography, with the distinction that the length of the column is increased.
  • 1 D long-column chromatography may involve RPLC, preferably RP-HPLC, as taught above, such as without limitation C18 or phenyl-based RPLC.
  • 1 D long-column chromatography may involve hydrophilic interaction chromatography (HILIC), such as ZIC-HILIC, optionally with RPLC pre-loading (e.g., C18 pre-loading), as described above.
  • HILIC hydrophilic interaction chromatography
  • chromatographic beds may need to be characterized by an inherent high permeability and/or ability to withstand high pressures and/or temperatures.
  • silica monolithic beds fulfil the former requirement.
  • Zorbax Stable Bond particles Alent
  • Zorbax Stable Bond particles can be used at temperatures up to 90 0 C and can tolerate relative high pressures.
  • commercially available columns can be coupled or columns can be constructed in one piece utilizing commercially available particles.
  • the N-terminal and/or C-terminal peptides recovered using the above methods are highly representative of and can thus identify the corresponding proteins in a starting sample.
  • separation, analysis and/or identification of peptides resolved herein may be performed using a mass spectrometer. Otherwise, said peptides may be analysed and/or identified using other methods such as, e.g., activity measurement in assays, analysis with specific antibodies, Edman sequencing, etc.
  • peptides released (e.g., eluted) from the final step of the multidimensional separation process can be directly (on-line) fed to an analyser (such as, e.g., on-line LC/MS/MS).
  • an analyser such as, e.g., on-line LC/MS/MS.
  • peptides may be collected in fractions which, optionally following additional manipulation (e.g., concentration and/or spotting onto a MALDI-matrix; or advantageously, mixing with matrix in a microtee prior to deposition on MALDI targets, thereby eliminating the need for concentration and manual spotting; etc.), can be fed to an analyser.
  • peptides resolved herein are analysed and identified using mass spectrometry, preferably high-throughput mass spectrometric (MS) techniques known per se, that can obtain precise information on the mass of the peptides and preferably also on (partial) amino acid sequence of the peptides (e.g., in tandem mass spectrometry, MS/MS; or in post source decay TOF MS).
  • mass spectrometry preferably high-throughput mass spectrometric (MS) techniques known per se, that can obtain precise information on the mass of the peptides and preferably also on (partial) amino acid sequence of the peptides (e.g., in tandem mass spectrometry, MS/MS; or in post source decay TOF MS).
  • MS mass spectrometry
  • MS arrangements and instruments appropriate for peptide analysis are commonly known and may include, without limitation, matrix-assisted laser desorption/ionisation time-of-flight
  • MALDI-TOF MS systems MALDI-TOF post-source-decay (PSD) systems; MALDI-TOF/TOF systems; electrospray ionisation (ESI) 3D or linear (2D) ion trap MS systems; ESI triple quadrupole MS systems; ESI quadrupole orthogonal TOF systems (Q-TOF); or ESI Fourier transform MS systems; etc.
  • ESI electrospray ionisation
  • MS/MS may be achieved using manners established in the art, such as, e.g., collision induced dissociation (CID).
  • CID collision induced dissociation
  • Algorithms and software exist in the art that compare experimental mass spectra and optionally also (partial) sequence information for the analysed peptides with a database of peptide masses/sequences predicted on the basis of sequencing information in protein and nucleic acid databases, and identify the corresponding peptides: e.g., ProFound, X! Tandem, (http://prowl.rockefeller.edu), MASCOT (http://www.matrixscience.com, Matrix Science Ltd.
  • Sequest http://fields.scripps.edu/sequest/; US 6,017,693; US 5,538,897
  • OMSSA http://pubchem.ncbi.nlm.nih.gov/omssa/
  • Identification of N-terminal peptides can also benefit from the use of specialised N-terminally ragged databases to account for protein processing, as known in the art (e.g., Gevaert et al. 2003. Nat Biotechnol 21 : 566-569; Martens et al. 2005. Proteomics 5: 3139-3204).
  • the herein disclosed methods may achieve identification of any number or even substantially all (i.e., comprehensive analysis) N- and/or C-terminal peptides present in starting protein peptide mixtures (PPM).
  • the methods may further encompass art established technique(s) allowing to determine the relative or absolute quantity of one or more proteins in the starting sample (see, e.g., WO 03/016861 , WO 02/084250 or WO 2004/111636).
  • the methods and systems of the present invention may be employed to identify proteins differentially present between samples, more preferably biomarkers.
  • Marker or “biomarker” as used herein refer to a protein or polypeptide which is differentially present in a sample taken from subjects having a genotype or phenotype of interest and/or who have been exposed to a condition of interest (herein “query sample”), as compared to an equivalent sample taken from control subjects not having said genotype or phenotype and/or not having been exposed to said condition (herein “control sample”).
  • Samples can be as disclosed above and may be broadly applied to compare for instance subcellular fractions, cells, tissues, biological fluids (e.g., nipple aspiration fluid, saliva, sperm, cerebrospinal fluid, urine, blood, serum, plasma, synovial fluid), organs and/or complete organisms.
  • a particularly relevant phenotype may be a pathological condition of interest in patients, such as, e.g., cancer, an inflammatory disease, autoimmune disease, metabolic disease, CNS disease, ocular disease, cardiac disease, pulmonary disease, hepatic disease, gastrointestinal disease, neurodegenerative disease, genetic disease, infectious disease or viral infection; vis-a-vis the absence of such conditions in healthy controls.
  • Other comparisons may be envisaged between samples from, e.g., stressed vs. non-stressed conditions/subjects, drug-treated vs. non drug-treated conditions/subjects, benign vs. malignant diseases, adherent vs. non-adherent conditions, infected vs.
  • uninfected conditions/subjects transformed vs. untransformed cells or tissues, different stages of development, conditions of overexpression vs. normal expression of one or more genes, conditions of silencing or knock-out vs. normal expression of one or more genes, and so on.
  • a marker may be a protein which is present at an elevated level or at a decreased level in query samples compared to control samples.
  • a marker may also be a protein which is detected at a higher frequency or at a lower frequency in query samples compared to control samples.
  • a protein may be differentially present between two samples if the protein's quantity in one sample is at least about 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900% or at least about 1000% of its quantity in the other sample; or if it is detectable in one sample but not detectable in the other sample.
  • a protein may be differentially present between two sets of samples if the frequency of detecting the protein in one set of samples is at least about 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900% or at least about 1000% of the frequency of detecting the protein in the other set of samples; or if the protein is detectable at a given frequency in one set of samples but is not detected in the second set of samples.
  • the present methods may be employed to identify proteins differentially present between query and control samples, thereby identifying potential biomarkers.
  • query samples and control samples may be analysed separately and abundances of corresponding peptides may be subsequently compared there between. This is generally known in the art as label-free profiling.
  • the samples may be analysed in the same separation experiment insofar peptides derived from such samples are differentially labelled allowing to attribute a given readout to one of the starting samples.
  • samples typically two samples
  • Such differentially-labelled samples may be analysed in the same separation experiment.
  • the mass difference caused by the presence of other isotopes allows to distinguish - and compare the relative intensity of - peaks corresponding to equivalent peptides from the differentially-labelled samples on MS.
  • the protein peptide mixture (PPM) to be analysed according to the methods of the invention may be prepared by combining, preferably in equal amounts:
  • PPM1 protein peptide mixture
  • PPM2 protein peptide mixture
  • one or more N-terminal and/or C-terminal peptides differentially present between the first and second samples can be identified by comparing the peak heights or areas of identical but differentially isotopically labelled peptides. The identity of the isolated peptide and its corresponding protein - potentially representing a biomarker - can then be determined.
  • the invention thus also provides a method for identification of proteins differentially present between a first protein mixture (PM1 )(such as, e.g., a protein mixture from a query sample or a pool thereof) and a second protein mixture (PM2) (such as, e.g., protein mixture from a control or reference sample or a pool of any thereof) comprising the steps: (a) fragmenting the first protein mixture PM1 to obtain a first protein peptide mixture (PPM1 ) and fragmenting the second protein mixture PM2 to obtain a second protein peptide mixture (PPM2);
  • the differential isotopic labelling of peptides in the first and second samples can be done in many different ways available in the art.
  • a key element is that a particular peptide originating from the same protein in a first and second sample is identical, except for the presence of a different isotope in one or more amino acids of the peptide. Examples of pairs of distinguishable isotopes are 12 C and 13 C, 14 N and 15 N or 16 O and 18 O.
  • Peptides labelled with such isotopes are chemically very similar, separate chromatographically in the same manner and also ionise in the same way. However, when fed into an analyser, such as MS, they will segregate into the distinguishable light and heavy peptide.
  • the results of the mass spectrometric analysis of isolated peptides will thus be a plurality of pairs of closely spaced twin peaks, each twin peak representing a heavy and a light peptide.
  • the ratios (relative abundance) of the peak intensities of the heavy and light peak in each pair are then measured. These ratios give a measure of the relative amount (differential presence) of that peptide (and its corresponding protein) in each sample.
  • the peak intensities can be calculated in a conventional manner (e.g., by calculating the peak height or peak surface). Incorporation of isotopes into peptides can be obtained in multiple ways.
  • proteins are labelled by growing cells in media supplemented with an amino acid containing the different isotopes (SILAC; see, e.g., in Ong et al. 2002 (MoI Cell Proteomics 1 (5): 376-86).
  • SILAC an amino acid containing the different isotopes
  • the different isotopes can be incorporated by an enzymatic approach.
  • labelling can be carried out by treating one sample comprising proteins with trypsin in H 2 16 O and the second sample comprising proteins with trypsin in H 2 18 O. Trypsin incorporates two oxygens of water at the COOH-termini of the newly generated sites during cleavage.
  • treating protein peptide mixture post-digestion with trypsin in H 2 16 O or H 2 18 O leads to incorporation of one oxygen ( 16 O or 18 O, respectively) at the COOH-termini of the component peptides (see, e.g., US 2006/105415).
  • the samples and/or mixture thereof may be acidified, e.g., to pH less than about 5, more preferably less than about 4, even more preferably to pH about 3.
  • the labelled samples may be added to an already acidic solution, such that after sample addition desired acidic pH is attained and back-exchange is immediately prevented.
  • the acidification may be with TFA (trifluoroacetic acid) which is particularly compatible with downstream SCX sorting. Acidification and particularly TFA-mediated acidification provides an advantageous, SCX-compatible alternative to guanidinium HCI/TCEP/IAA extraction used to inactivate trypsin and prevent back-exchange in the art, which conditions are considerably less SCX-compatible.
  • differential isotopes can be incorporated into peptides by chemical labelling reactions known in the art.
  • peptides can be changed by Shiffs-base formation with deuterated acetaldehyde followed by reduction with normal or deuterated sodiumborohydride. This reaction, which is known to proceed in mild conditions, may lead to the incorporation of a predictable number of deuterium atoms.
  • Peptides will be changed either at the ⁇ -NH 2 -group, or ⁇ -NH 2 groups of lysines or on both. Similar changes may be carried out with deuterated formaldehyde followed by reduction with NaBD 4 , which will generate a trideutero-methylated form of the amino groups.
  • the reaction with formaldehyde could be carried out either on the total protein, incorporating deuterium only at lysine side chains or on the peptide mixture, where both the ⁇ -NH 2 and lysine-derived NH 2 -groups will be labelled.
  • the samples may be differentially labelled using the iTRAQ technology with isobaric reagents that tag amine groups, essentially as taught in Ross et al. 2004 (MoI Cell Proteomics 3(12): 1154-69). These tags are preferably used in conjunction with tandem MS mode in which each tag generates a unique reporter ion.
  • the methods of the invention may also be employed in a diagnostic mode to detect the presence, absence or a variation in expression level of one or more biomarkers or a specific set of proteins indicative of a disease state (e.g., such as cancer, neurodegenerative disease, inflammation, cardiovascular diseases, viral infections, bacterial infections, fungal infections or any other disease) in a sample.
  • a disease state e.g., such as cancer, neurodegenerative disease, inflammation, cardiovascular diseases, viral infections, bacterial infections, fungal infections or any other disease.
  • Example 1 Flow chart of isolation of peptides comprising C-terminal ends of proteins
  • Figure 1 shows a flow-chart of an exemplary method for isolating peptides comprising C- terminal ends of proteins from a protein peptide mixture.
  • Preparation of peptides in the first step involves acetylation of ⁇ -NH2 groups of lysines and of N-terminal ⁇ -NH 2 groups, as well as alkylation of -SH groups of cysteines.
  • Protein peptide mixture (synthetic or trypsin digest) is brought in 100 mM sodium pyrophosphate to pH 9 with NaOH, p-hydroxyphenylglyoxal is added at 100 fold molar excess, in same buffer at pH 9, the reaction is allowed to proceed in the dark at room temperature for 3 hours, subsequently the reaction is quenched by desalting.
  • Figure 2 depicts the stoichiometry of the reaction of p-hydroxyphenylglyoxal with the guanidino group of an arginine residue in a peptide chain (R, R depict in this figure the remaining portions of the peptide chain).
  • Protein peptide fractions from RP-HPLC (4 min wide) are dried and re-dissolved in 100 mM NaOH to which 3 mg of NMA (sodium salt) is added. Modification reaction is allowed to proceed for 2 hours at 30 0 C, and is subsequently quenched with 50 ⁇ l 20OmM acetic acid.
  • Figure 3 depicts the stoichiometry of the reaction of NMA with the guanidino group of an arginine residue in a peptide chain (R, R depict in this figure the remaining portions of the peptide chain).
  • Protein mixture isolated from a total cell lysate was processed for analysis and examined as follows:
  • - proteins of the protein mixture were modified by alkylation of cysteine residues and trideuteroacetylation of free amino groups (thereby preventing trypsin cleavage after lysine residues, and neutralising the positive charge of lysine residues, as well as neutralising the positive charge of free N-terminal ⁇ -NH2 groups); the so-modified protein mixture was digested with trypsin;
  • SCX flow-through fraction containing N-terminal, C-terminal and blocked internal peptides, was collected; free amines were acetylated to prevent their side-reaction with NMA, methionine residues were oxidised to their sulfoxide; the peptide mixture was separated by RP-HPLC (1 st separation) and fractions of 4 min wide were collected; - these collected "primary" fractions were dried and re-dissolved in 100 mM NaOH to which
  • NMA nitromalondialdehyde
  • each primary fraction including eight 30 second-wide fractions (said 8 fractions - numbered #1 to #8 - covering the whole collection interval of the respective primary fraction interval) and 2 "post fractions" of 1 min each, numbered #9 and #10; so-isolated secondary fractions of two consecutive primary fractions (primary fractions with retention times 36-40 min and 40-44 min, herein denoted fraction "36-40" and "40-
  • Table 6 lists the 100 unique peptide sequences identified in the above experiment and also indicates: the unique UniProtKB/Swiss-Prot database (http://www.expasy.org/uniprot/) accession number of the protein from which the peptides originated, whether or not they correspond to the C-terminus of a protein, and how many of the actually isolated peptides corresponding to said unique peptide sequences were modified with NMA.
  • the HPLC column was an analytical RP-HPLC column: 2.1 mm internal diameter (I. D.) x 150 mm (length) 300SB-C18 column, Zorbax® (Agilent, Waldbronn, Germany).
  • Example 5 A biomarker discovery platform
  • a biomarker discovery platform may employ a "reference design mode" in which query samples (e.g., from diseased individuals) and control samples (e.g., from healthy individuals) are quantitated relative to a same reference sample pool. Query and control samples are thus compared indirectly. For example, about 10 query samples may be quantified vs. the pool and about 10 control samples may be quantified vs. the same pool.
  • the example below describes the comparison of one sample vs. a reference pool, using an exemplary layout of peptide sorting, separation and identification based on SCX isolation of N-terminal peptides.
  • the proteins were subsequently denatured by adding guanidinium hydrochloride (final concentration 3M), reduced and alkylated using TCEP and iodoacetamide added in a 25 and 50 molar excess, respectively. Reduction took place at 30 0 C during 10 min; alkylation at 60 0 C during 1 h.
  • the mixtures were subsequently acetylated at 30°C during 90 min by adding sulfo-N- hydroxysuccinimide-acetate in a 75 molar excess.
  • the samples were desalted on a PD10 column and captured in a 10 mM NH 4 HCO 3 buffer at pH 8. Protein concentrations were measured as 870 ⁇ g (sample - 72% recovery) and 910 ⁇ g (reference pool - 73% recovery).
  • the samples present in 3.5 ml following PD-10 desalting and buffer exchange were subsequently dried to 2 ml and digested with trypsin in a substrate:trypsin ratio of 50:1 (w:w) by overnight incubation at 37°C.
  • the samples were acidified to pH 6 (by adding 10% FA) the following day and completely dried.
  • 300 ⁇ l of H 2 16 O was added to the reference pool and 300 ⁇ l H 2 18 O to the serum sample and labelling took place during 4Oh.
  • 125 ⁇ g 16 O (41.21 ⁇ l) and 125 ⁇ g 18 O (43.10 ⁇ l) labelled samples were subsequently combined in a controlled manner to prevent back-exchange.
  • 2 times 25 ⁇ l sample (sample and reference pool) is sufficient for one experiment thereby reducing the number of depletion runs with a factor 4.
  • the mixing of both samples at pH 6 appeared to induce an immediate back-exchange. Therefore, the 16 O and 18 O labelled samples were acidified (to pH 3) prior to mixing. In this way back-exchange could be prevented.
  • TFA was advantageously used to acidify the sample, since FA tends to interfere with the successful operation of the SCX column.
  • ACN was added to a final concentration of 50%. The latter was used to prevent non-specific interaction with the SCX column.
  • the final volume was 550 ⁇ l (41.21 ⁇ l 16 O, 43.10 ⁇ l 18 O, 275 ⁇ l ACN, 18 ⁇ l 1 % TFA, 172.7 ⁇ l water) allowing the injection of 500 ⁇ l onto the SCX column.
  • Injection of relative large volumes onto the SCX column allows the dilution of salts present in the sample; in the case presented salts originated from the NH 4 HCO 3 buffer. If sample salt concentrations are too high, the binding of internal peptides onto the SCX column might be prevented.
  • the final salt concentration in the sample was ⁇ 20 mM which is not expected to interfere with the successful operation of the SCX column.
  • Higher loop volumes up to 2 ml were tested to further reduce salt concentration but the flow- through volume became too high, resulting in inefficient sample handling.
  • the column used was a Zorbax 300 Angstrom SCX column (2.1 mm ID, 5 cm L, 3.5 ⁇ m particle diameter) (Agilent).
  • Stationary phase consists of silica particles with negatively charged residues (sulfonic acid) attached. This residue is charged over a wide pH range.
  • the use of a 15 cm column was also considered, however, flow-through volume was higher and equilibration times 3x longer.
  • the 5 cm column has sufficient capacity to handle 250 ⁇ g (125 ⁇ g 16 O and 125 ⁇ g 18 O).
  • the SCX procedure consists of several steps:
  • the orthogonality of a phenyl and C18 column is limited when they are both operated at low pH. Therefore, the first dimension column was operated at high pH (pH 10).
  • the X-Terra portfolio of columns is specifically designed for operation at higher pH.
  • the nano-LC column was operated at pH 2.
  • the SCX flow-through was dissolved in 500 ⁇ l mobile phase A consisting of 1OmM NH 4 OAc (pH 10).
  • the entire sample was subsequently injected onto the X-Terra Phenyl LC column. Large volume injection onto the column appeared to be feasible, which is of major importance to limit the sample loss during sample handling.
  • a 60 min ACN gradient was applied to separate the mixture (mobile phase B: 80% ACN, 1OmM NH4OAc (pH 10).
  • MS and MS/MS measurements were performed on a 4800 MALDI-TOF/TOF machine in the positive reflectron mode using default calibration.
  • the scan range for the MS spectra stretched from 500-4000.
  • a list of the top 20 signals, per MS spectrum was generated and MS/MS experiments were performed under "metastable precursor on" conditions, without the use of CID (collision induced dissociation) and at 1 keV.
  • the precursor mass window was set at a resolution of 250 FWHM (full width half maximum).
  • Unfiltered MASCOT generic files (mgf) were subsequently searched against both standard and ragged human Sprot databases using MASCOT as search engine. The latter database was used to detect N-terminally ragged peptides which are abundantly present in serum.
  • HILIC can be considered as the reverse of reversed-phase LC. This means that mobile-phase A contains high concentrations of ACN, and mobile phase B high concentrations of water. Water is actually the strongest eluent in HILIC. This separation mode appeared to be highly orthogonal with RPLC at low pH as demonstrated in Figure 8. RPLC separations were performed as described above.
  • the ZIC-HILIC column (15 cm L x 2.1 mm ID x 3.5 ⁇ m d p ) was operated at a flow rate of 100 ⁇ l/min.
  • Mobile phase A consisted of 85% ACN, 20 mM NH 4 OAc (pH 6.8) while mobile phase B consisted of 40% ACN, 20 mM NH 4 OAc (pH 6.8).
  • a linear gradient between 0 and 100% B was applied in 60 min.
  • HILIC separations One drawback of HILIC separations is the fact that the sample needs to be dissolved in high ACN containing mobile phases (between 80 and 90% ACN), especially when combined with large volume injections (to obtain sufficient focusing onto the column). A number of peptides tend to precipitate at high ACN concentrations.
  • Peptides are first injected and focused onto a short (1.25 cm x 2.1 mm) Zorbax Extend C18 column and after a sufficient loading time, the peptides are eluted onto the HILIC column by placing the C18 column in-line with the HILIC column.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Urology & Nephrology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Biochemistry (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Immunology (AREA)
  • Hematology (AREA)
  • Cell Biology (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Genetics & Genomics (AREA)
  • Food Science & Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention relates to methods and apparatus for the enrichment and/or isolation of a subset of peptides out of complex mixtures of peptides. In particular, the invention contemplates the enrichment and/or isolation of peptides which comprise the N-terminal ends or the C-terminal ends of proteins from which said complex mixtures of peptides are obtained. The methods and apparatus of the invention are particularly applicable for qualitative and/or quantitative proteome analysis. The invention also provides a novel proteomics platform using the principles of the invention.

Description

ISOLATION OF PEPTIDES AND PROTEOMICS PLATFORM
FIELD OF THE INVENTION
The invention relates to methods and apparatus for the enrichment and/or isolation of a subset of peptides out of complex mixtures of peptides. In particular, the invention contemplates the enrichment and/or isolation of peptides which comprise the N-terminal ends or the C-terminal ends of proteins from which said complex mixtures of peptides are obtained. The methods and apparatus of the invention are particularly applicable for qualitative and/or quantitative proteome analysis.
BACKGROUND OF THE INVENTION The proteome is usually described as the entire complement of proteins found in a biological system, such as, e.g., a cell, tissue, organ or organism. Proteomics is concerned with the study of the proteome expressed at particular times and/or under internal or external conditions of interest. Proteomics approaches frequently aim at global analysis of the proteome, and require that large numbers of proteins, e.g., hundreds or thousands, can be routinely resolved and identified from a single sample.
Among the promises of proteomics is its ability to recognise new biomarkers, i.e., biological indicators that signal a changed physiological state, such as due to a disease or a therapeutic intervention. Biomarker discovery usually involves comparing proteomes expressed in distinct physiological states, and identifying proteins whose occurrence or expression levels consistently differ between said physiological states.
Methods allowing for proteome analysis without the need to purify each protein to homogeneity have been developed. Typically, such methods fragment the proteins of a sample into peptides using agents with known specificity of cleavage (e.g., endoproteinases), fractionate the constituent peptides by chromatography, and determine the mass and sequence of the fractionated peptides by mass spectrometry. The obtained mass and sequence information is used to search sequence databases in order to make out proteins from which the respective peptides originated.
However, proteolysis of complex biological samples can produce thousands of peptides, which may overwhelm the resolution capacity of known chromatographic and mass spectrometric systems, causing incomplete separation and impaired identification of the constituent peptides.
One manner to enable proteomic analysis of biological samples is to reduce the complexity of protein peptide mixtures generated by fragmentation of such samples, before subjecting said peptide mixtures to downstream resolving and identification steps, such as chromatographic separation and/or MS. Ideally, reducing the complexity of protein peptide mixtures will decrease the average number of distinct peptides present per individual protein of the sample, yet will maximise the fraction of proteins of the sample actually represented in the peptide mixture. WO 02/077016 discloses a methodology ("COFRADIC") for qualitative and/or quantitative proteome analysis, wherein the complexity of the starting protein peptide mixture is reduced as follows: (a) the protein peptide mixture is separated into individual fractions of peptides using chromatography; (b) at least one amino acid of at least some of the peptides in each fraction is enzymatically and/or chemically altered, thus generating a subset of altered peptides and a subset of unaltered peptides for each fraction, and (c) the subset of altered peptides and the subset of unaltered peptides for each fraction are separated via chromatography and the particular subset of interest is isolated for further characterisation. The chromatography of steps (a) and (c) is performed using the same type of chromatography, which allows comparison of the chromatographic properties of the altered peptides.
Accordingly, there exists a need in the art, especially in the field of proteomics, to provide further as well as improved ways for decreasing the complexity of protein peptide mixtures, in particular by isolating therefrom peptides comprising the N- and/or C-terminal ends of the proteins present in starting biological samples. More particularly, there also exists a need to provide for alternative as well as improved methods that can be used in the above step (b) of the method disclosed in WO 02/077016 A2 to allow for distinguishing and isolating peptides comprising the C-terminal ends of proteins from other peptides, e.g., N-terminal and internal peptides, present in the analysed peptide fractions. Given that each protein includes a C- terminus, isolation of such C-terminal peptides provides excellent representation of the majority of said proteins and, at the same time, considerable reduction of the complexity of the peptide mixture to be analysed. Such methods may therefore significantly aid gel-free proteomic analysis of complex biological samples. In addition, although the COFRADIC methodology is valuable, it involves multiple independent handling steps, which might introduce handling errors and increase labour- intensiveness. Consequently, there also exists a need for proteomic platforms which involve effective, robust and relatively simple (e.g., including a minimum of steps and optimally applied on a whole peptide digest) manners to decrease the complexity of peptide digests, coupled to appropriate steps for resolving and identification of constituent peptides, such as to facilitate comprehensive proteome analysis of complex samples.
SUMMARY OF THE INVENTION
The aspects of the invention address the above discussed needs of the art. In a first group of aspects according to the invention, the inventors contemplate that when a protein or a mixture of proteins, such as, e.g., proteins of a complex biological sample, are fragmented C-terminally adjacent to one or more specific amino acid residue types (herein generically denoted as amino acid residue types "X1", "X2",... "Xπ"), then majority of peptides comprising the C-terminal ends of the starting proteins will not include said one or more amino acid residue types X1, X2,... Xπ, unless any of the residue types X1, X2,... Xπ was the actual C-terminal residue of the respective protein. In contrast, essentially all peptides originating from the N-terminal ends or from the internal portions of the starting proteins will comprise one of said one or more amino acid residue types X1, X2,... Xn as their last residue.
The invention takes advantage of this situation by reacting a peptide mixture obtained by said fragmentation with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xπ, followed by isolation of those peptides that have not been altered by said agent. The so isolated, unaltered peptides are thus those that did not include any of said one or more amino acid residue types X1, X2,... Xπ, and are therefore highly enriched for peptides comprising the C-terminal ends of the starting proteins. Accordingly, in an aspect, the invention provides a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of: (a) fragmenting the protein (P) or the mixture of proteins (PM) preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types (X1, X2,... Xπ) to obtain the protein peptide mixture (PPM); and (b) isolating a subset (S) of peptides from said protein peptide mixture (PPM), comprising the steps of: reacting the protein peptide mixture (PPM) with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xπ, and isolating from the so reacted protein peptide mixture the subset (S) of peptides unaltered by said reacting.
In the present method, the modification of the amino acid residue types X1, X2,... Xπ in, or their removal from, the peptides comprising said amino acid residue types X1, X2,... Xπ (i.e., mainly the N-terminal and internal peptides), will suitably change the properties of the so altered peptides to allow the subset (S) of unaltered peptides (i.e., primarily peptides comprising the C-termini of the starting proteins) to be distinguished and isolated from the altered peptides.
Preferably, the modification of the amino acid residue types X1, X2,... Xπ in, or their removal from, the peptides comprising said residue types X1, X2,... Xπ will change the chromatographic behaviour of the so altered peptides, allowing to distinguish and isolate said subset (S) of unaltered peptides from the altered peptides by chromatography.
In this respect, the inventors have recognised that the above method for isolating peptides comprising C-terminal ends of proteins may be very preferably used in conjunction with the overall method of WO 02/077016 A2, i.e., the step of altering the peptides comprising the amino acid residue types X1, X2,... Xπ may be interposed between two chromatographic separations of the same type, wherein the peptide alteration step modifies the chromatographic behaviour of the altered peptides in the second chromatographic separation.
Accordingly, in a preferred embodiment, the invention provides a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of:
(a) fragmenting the protein (P) or the mixture of proteins (PM) preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types (X1, X2,... Xπ) to obtain the protein peptide mixture (PPM); and
(b) isolating a subset (S) of peptides from said protein peptide mixture (PPM), comprising the steps of: (ba) separating the protein peptide mixture (PPM) into fractions of peptides via chromatography, (bb) reacting at least one and preferably each peptide fraction from step (ba) with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xπ, thereby obtaining altered and unaltered peptides for each so reacted fraction, and (be) isolating the subset of unaltered peptides out of each so reacted fraction via chromatography, wherein the chromatography of steps (ba) and (be) is performed with the same type of chromatography. As noted, the protein (P) or the mixture of proteins (PM) are fragmented preferentially at peptide bonds C-terminally adjacent to either one (X1) or to two or more different amino acid residue types X1, X2,... Xπ. To reduce the chance that the actual C-terminal residue of the analysed proteins is any of the residue types X1, X2,... Xπ, it may be advantageous to fragment the proteins C-terminally adjacent to a relatively small number of amino acid residue types X1, X2,... Xπ. Preferably, the protein (P) or the mixture of proteins (PM) may be fragmented at peptide bonds C-terminally adjacent to 5 or less amino acid residue types (i.e., n<5), more preferably 4 or less amino acid residue types (i.e., n<4), even more preferably 3 or less amino acid residue types (i.e., n<3), still more preferably 2 or less amino acid residue types (i.e., n<2) and most preferably the protein (P) or mixture of proteins (PM) may be fragmented at peptide bonds C-terminally adjacent to only 1 amino acid residue type (i.e., n=1 ).
Whereas the fragmentation may take place at peptide bonds C-terminally adjacent to substantially any type of amino acid residue, suitable frequency of cleavage may be preferably achieved when the fragmentation takes place C-terminally adjacent to one or more of the 20 common amino acid residue types found in natural proteins, and/or to one or more of residues obtained from any of the 20 common amino acid residue types by suitable modification of the starting proteins (e.g., modification of lysine to homoarginine). Hence, in a preferred embodiment, the protein (P) or the mixture of proteins (PM) are fragmented preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types X1, X2,... Xπ, wherein said one or more amino acid residue types X1, X2,... Xπ are chosen from the group consisting of: glycine (GIy, G), proline (Pro, P), alanine (Ala, A), valine (VaI, V), leucine (Leu, L), isoleucine (lie, I), methionine (Met, M), cysteine (Cys, C), phenylalanine (Phe, F), tyrosine (Tyr, Y), tryptophan (Trp, W), histidine (His, H), lysine (Lys, K), arginine (Arg, R), glutamine (GIn, Q), asparagine (Asn, N), glutamic acid (GIu, E), aspartic acid (Asp, D), serine (Ser, S) and threonine (Thr, T), or a residue obtained from any of the above by suitable modification. As mentioned, the present method may involve reacting a protein peptide mixture obtained by fragmentation of proteins C-terminally adjacent to one or more amino acid residue types X1, X2,... Xn, with an agent capable of specifically modifying said residue types X1, X2,... Xπ. The amino acid residue types X1, X2,... Xπ may therefore be preferably selected from amino acid types whose side chains comprise comparably reactive moieties. For example, in a preferred embodiment, the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types X1, X2,... Xπ, wherein said one or more amino acid residue types X1, X2,... Xπ comprise a moiety chosen from mercapto, methylthio, hydroxyphenyl, primary amino, secondary amino (including, inter alia, indyl, pyrrolidinyl and imidazyl, preferably indyl and imidazyl), guanidino, ureyl or carboxyl. For example, in preferred embodiments, the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types X1, X2,... Xπ, wherein said one or more amino acid residue types X1, X2,... Xπ are chosen from: the group consisting of Met, Cys, Tyr, Trp, His, Pro, Lys, Arg, hArg, GIu and Asp; or the group consisting of Met, Cys, Tyr, Trp, His, Lys, Arg, hArg, GIu and Asp; or the group consisting of His, Lys and Arg; or the group consisting of Lys and Arg; or the group consisting of Met and Cys; or the group consisting of Tyr and Trp; or the group consisting of Asp and GIu.
In preferred embodiments of the invention, fragmenting of the protein (P) or the mixture of proteins (PM) may be effected enzymatically, preferably by an endoproteinase, more preferably by trypsin.
In a preferred embodiment, the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to amino acid residue(s) that comprise a guanidino moiety, such as, e.g., arginine and/or homoarginine, and/or C-terminally adjacent to lysine, wherein the lysine may be advantageously converted to homoarginine subsequent to the fragmentation. In the so obtained protein peptide mixture (PPM), peptides comprising a guanidino moiety will thus mainly be derived from the N-termini or internal portions of the starting protein(s), whereas peptides lacking a guanidino moiety will mostly originate from and comprise the C-terminal ends of the starting protein(s). Hence, in related embodiments, the fragmentation may preferentially occur at peptide bonds C-terminally adjacent to arginine and/or homoarginine and/or lysine residues of said protein (P) or mixture of proteins (PM), such as, e.g., C-terminally adjacent to Arg (i.e., n=1 , X1=Arg); or to Arg and homoarginine (i.e., n=2, X1=Arg, X2=hArg); or to Arg and Lys (i.e., n=2, X1=Arg, X2=Lys); etc. Homoarginine (hArg) may be preferably introduced to proteins before the fragmentation by a suitable modification of Lys. Lys may be preferably converted to hArg after the fragmentation.
Suitable modification of the guanidino moiety can discriminate those peptides of the protein (P) or protein peptide mixture (PPM) that comprise a guanidino moiety (altered by said modification) from those that do not (unaltered by said modification), and thereby allow to isolate the latter, mainly C-terminal, peptides. The invention contemplates advantageous manners to modify peptides that include a guanidino moiety, such as, e.g., peptides with Arg or hArg, more preferably with Arg. For example, in preferred, but non-limiting embodiments, peptides comprising a guanidino moiety may be modified by reacting with an agent chosen from a dicarbonyl compound or derivative thereof (such as, e.g., preferably with an arylglyoxal, more preferably phenylglyoxal or hydroxyphenylglyoxal, or also very preferably with nitromalondialdehyde), a peptidylarginine deiminase or an arginase.
In a further preferred embodiment, the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to a basic amino acid residue, more preferably C-terminally adjacent to Arg and/or hArg and/or Lys. In the so obtained protein peptide mixture (PPM), peptides comprising said basic amino acid
(preferably Arg, hArg or Lys) as their last residue will mainly be derived from the N-termini or internal portions of the starting protein(s), whereas peptides not having such basic amino acid as their last residue will mostly originate from and comprise the C-terminal ends of the starting protein(s). Hence, in related embodiments, the fragmentation may preferentially occur at peptide bonds C-terminally adjacent to Arg and/or hArg and/or Lys of said protein (P) or mixture of proteins (PM), such as, e.g., C-terminally adjacent to Arg (i.e., n=1 , X1=Arg); or to
Arg and Lys (i.e., n=2, X1=Arg, X2=Lys); or to Arg and hArg (i.e., n=2, X1=Arg, X2=hArg); etc. Homoarginine (hArg) may be preferably introduced to proteins before the fragmentation by a suitable modification of Lys.
Removal of basic last residue, preferably Arg, hArg or Lys, can discriminate those peptides of the protein (P) or protein peptide mixture (PPM) that contained such basic last residue from those that did not, and thereby allow to isolate the latter, mainly C-terminal, peptides. The invention contemplates advantageous manners to remove basic last residues, preferably Arg, hArg or Lys, from peptides, e.g., preferably, but without limitation, using carboxypeptidase B. An added advantage of fragmenting proteins C-terminally adjacent to Arg and/or hArg and/or Lys, preferably to Arg and/or Lys, as above, is that such cleavage may be achieved using trypsin, which - due to its high specificity and efficiency of proteolysis - is a particularly preferred endoproteinase for proteomics applications (trypsin cleaves preferentially C- terminally adjacent to Arg and Lys and, to a lesser extent, after hArg).
In a further development of the invention, embodiments which involve fragmenting of the protein (P) or the protein mixture (PM) preferentially C-terminally adjacent to a basic residue (such as, e.g., C-terminally adjacent to Arg and/or hArg and/or Lys and/or His, more preferably Arg and/or hArg and/or Lys, even more preferably Arg and/or Lys, yet more preferably Arg, or also preferably Lys) may benefit from an additional step to enrich for C- terminal peptides from the protein (P) or protein mixture (PM). Preferably, said the additional step may be performed after the protein (P) or the protein mixture (PM) has been suitably fragmented and before the isolation of C-terminal peptides as disclosed in step (b) above.
Departing from a protein peptide mixture (PPM) obtained by fragmenting the protein (P) or the mixture of proteins (PM) preferentially at peptide bonds C-terminally adjacent to one or more basic amino acid residue types X1, X2, ... Xπ, the additional enrichment step comprises isolating, under conditions where the majority of the C-terminal -C(=O)OH groups of the peptides of the protein peptide mixture (PPM) are dissociated, i.e., -C(=O)O", the majority of the N-terminal -NH2 groups of said peptides are protonated, i.e., -NH3 +, and the basic side chain moiety of the majority of basic amino acids adjacent to which the protein (P) or the mixture of proteins (PM) were proteolysed are protonated, a subset (S') of peptides from the protein peptide mixture (PPM), wherein the peptides of said subset (S') have about zero net charge under said conditions. When the protein (P) or the mixture of proteins (PM) are proteolysed C-terminally adjacent to their basic amino acid residues, most peptides comprising the C-terminal ends of the starting proteins will not include a basic amino acid (unless a basic amino acid was the actual C- terminal residue of the respective protein), whereas majority or substantially all peptides originating from the N-terminal ends or from the internal portions of the starting proteins will comprise a basic amino acid as their last residue. Therefore, under the above defined conditions, the peptides comprising C-terminal ends of the respective proteins will in general have about zero net charge, whereas N-terminal and internal peptides will in general display net charge of about +1 or higher (departure from this situation may occur for peptides which, under said conditions, would contain additional charged side chain groups, such as, e.g., carboxyl, phosphate or sulphonate, or, where proteolysis does not take place after each basic residue (e.g., not after histidine, e.g., when trypsin is used), a charged basic group) (departure from the above general situation may also occur for N-terminal peptides derived from naturally N-terminally acetylated proteins).
This difference between the net charge of the peptides comprising the C-terminal ends of proteins and the remaining peptides of the protein peptide mixture (PPM) allows for separating said subsets of peptides and for isolating or at least enriching for the subset (S') of C-terminal peptides.
Hence, in a preferred embodiment, the invention provides a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of: (i) fragmenting the protein (P) or the mixture of proteins (PM) preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types (X1, X2,... Xπ) to obtain the protein peptide mixture (PPM), wherein said one or more amino acid residue types X1, X2,... Xπ are basic; (ii) isolating, under conditions where the majority of the C-terminal -C(=O)OH groups of the peptides of the protein peptide mixture (PPM) are dissociated, i.e., -C(=O)O", the majority of the N-terminal -NH2 groups of said peptides are protonated, i.e., -NH3 +, and the basic side chain moiety of the majority of basic amino acids adjacent to which the protein (P) or the mixture of proteins (PM) were proteolysed are protonated, a subset (S') of peptides from the protein peptide mixture (PPM), wherein the peptides of said subset (S') have about zero net charge under said conditions; and (iii) isolating a subset (S) of peptides from said subset of peptides (S'), comprising the steps of: reacting the subset (S') with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xπ, and isolating from the so reacted protein peptide mixture the subset (S) of peptides unaltered by said reacting.
Step (iii) may preferably comprise steps (ba), (bb) and (be) as taught above. It shall be appreciated that the order of steps (ii) and (Ni) can be reversed in the above method.
It shall also be appreciated that a workable method for isolating or enriching C-terminal peptides can already be obtained comprising steps (i) and (ii) and not step (iii). The inventors found that addition of step (ii) to the methods of the invention disclosed herein can provide for a significant, synergic, boost in the number of C-terminal peptides that can be identified with any of said method alone.
To reduce the effect of acidic amino acids, such as aspartic acid and glutamic acid, on the performance of step (ii) above, the conditions of step (ii) are preferably such that the acidic side chain moiety (in particular the -COOH side chain moiety) of the majority of said acidic amino acids, is not dissociated.
Regarding the conditions of step (ii), the pH may be preferably between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.25, and most preferably about 3. For example, said subsets of peptides may be resolved and the subset (S') isolated or enriched by cation exchange chromatography, preferably by strong cation exchange ("SCX") chromatography.
Where the fragmentation of the protein (P) or protein mixture (PM) in step (i) does not occur C-terminally adjacent to each type of basic amino acid, the charge of at least some or all of the basic amino acid types after which the proteolysis does not occur may be advantageously neutralised by suitable modification (e.g., acetylation for lysine, etc.). In particular, this will thwart the effect that the presence of such basic amino acid types in peptides derived from the C-terminal ends of the proteins would have on the charge of said peptides, and thereby ensure an about zero net charge for a greater proportion of the C-terminal peptides and improve their enrichment in step (ii).
In a further embodiment, the N-terminal -NH2 groups of the protein (P) or the mixture of proteins (PM) may be blocked before the fragmentation, such that the so-blocked groups are not protonated under the conditions of said additional enrichment step (a considerable proportion of α-NH2 groups of proteins may already be acetylated in nature). Consequently, peptides comprising the N-terminal ends of the protein (P) or mixture of proteins (PM) will also generally display zero net charge under said conditions and may be isolated or enriched for alongside the peptides comprising the C-terminal ends in step (ii).
Related aspects concern apparatus for performing the methods of the invention, uses of the methods of the invention in proteomic analysis, particularly for gel-free proteomic analysis of complex biological samples, etc.
In a second group of aspects of the invention, the invention generally provides a method for protein identification and optionally quantification from a protein mixture comprising the steps: (a) fragmenting a mixture of proteins (PM) to obtain a protein peptide mixture (PPM); (b) isolating from the protein peptide mixture PPM:
(ba) peptides comprising the N-terminal ends of proteins of the mixture of proteins PM (i.e., N-terminal peptides), and/or
(bb) peptides comprising the C-terminal ends of proteins of the mixture of proteins PM (i.e., C-terminal peptides); (c) separating the isolated N-terminal and/or C-terminal peptides into fractions of peptides via a multidimensional separation process or via one-dimensional long-column chromatography; and
(d) identifying and optionally quantifying one or more N-terminal and/or C-terminal peptides from one or more of said fractions, whereby said identified N-terminal and/or C-terminal peptides represent one or more proteins from the mixture of proteins PM.
Given that each protein includes an N-terminus and a C-terminus, enrichment for N-terminal and/or C-terminal peptides provides excellent representation of the majority of proteins of the analysed sample and, at the same time, considerable reduction of the complexity of the peptide mixture to be analysed. In an embodiment, N-terminal peptides are isolated and analysed. In another embodiment, C- terminal peptides are isolated and analysed. In a further embodiment, N-terminal and C- terminal peptides are isolated and analysed.
Herein, the inventors have realised advantageous manners that allow for robust and straightforward sorting of N-terminal and/or C-terminal peptides, even from relatively complex peptide mixtures. In addition, the inventors have achieved conditions that allow for satisfactory resolution of so-isolated N-terminal and/or C-terminal peptides - even while said peptides can represent rather complex parent protein samples - so as to facilitate identification and optionally quantification of the constituent N-terminal and/or C-terminal peptides. In particular, as realised by the inventors, suitable resolution conditions may involve one-dimensional long-column chromatography, which can achieve adequate peptide resolution due to the increased column length. Alternatively, a multidimensional separation process, such as preferably but without limitation orthogonal 2D-chromatography, can also achieve satisfactory separation of the N-terminal and/or C-terminal peptides isolated herein. Accordingly, the methods and systems described herein advantageously allow for comprehensive proteomic analysis of considerably complex protein mixtures, e.g., protein mixtures obtained from relevant biological samples.
In particular, the invention contemplates ways in which N-terminal and/or C-terminal peptides can be isolated in step (b) from protein peptide mixtures (PPM) obtained by fragmentation of the starting protein mixtures (PM) using trypsin, which tends to be favoured in proteomics applications due to its high specificity and efficiency of proteolysis. Hence, in a preferred embodiment the mixture of proteins (PM) is fragmented using trypsin or trypsin-like protease to obtain the protein peptide mixture (PPM).
Preferably, in step (b) N-terminal and/or C-terminal peptides can be isolated or enriched herein from tryptic digests on the basis of a difference in net charge between the majority of or substantially all N-terminal and/or C-terminal peptides compared to the majority of or substantially all remaining peptides.
In particular, trypsin cleaves proteins C-terminally adjacent to Arg and Lys residues (except where the ensuing residue is Pro). Consequently, trypsin cleavage generates a protein peptide mixture (PPM) wherein the majority of or substantially all C-terminal peptides do not contain Arg or Lys (unless Arg or Lys was the last C-terminal residue of the corresponding protein), whereas the majority of or substantially all N-terminal and internal peptides do contain Arg or Lys as their last residue.
Under conditions where substantially all Arg and Lys side chains are protonated (preferably under acidic conditions, more preferably at pH about 4.0 or less, even more preferably pH about 3.0 or less), N-terminal and internal peptides will thus in general carry an extra positive charge compared to C-terminal peptides. More in particular, the majority of or substantially all C-terminal peptides will display about zero net charge, whereas the majority of or substantially all N-terminal and internal peptides will show about +1 net charge. Departure from this general situation might occur for some peptides, for example: where C-terminal peptides contain Lys or Arg, e.g., as the last residue or as Lys-Pro or Arg-Pro; where peptides contain amino acids whose side chains may be charged under the above conditions, such as, e.g., His, Asp or GIu; where peptides contain other moieties that may be charged under the above conditions, such as, e.g., phosphate or sulphonate; where N-terminal peptides originate from naturally α-NH2 acetylated proteins and hence lack a protonated α- NH2 group under the above conditions; or where some N-terminal or internal peptides are produced by non-specific cleavage (non-tryptic peptides) and do not contain Arg or Lys. Hence, in an embodiment C-terminal peptides may be isolated or enriched from a protein mixture (PM) using steps comprising: proteolysing a mixture of proteins (PM) by trypsin or trypsin-like protease to obtain a protein peptide mixture (PPM); isolating, under conditions where substantially all Arg and Lys side chains are protonated, a subset of peptides from the protein peptide mixture (PPM), wherein the peptides of said subset have about zero net charge under said conditions.
Preferably, said conditions may encompass acidic conditions, more preferably pH about 4.0 or less, even more preferably pH about 3.0 or less, such as, e.g., pH between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.5. Such relatively low pH may advantageously also ensure that the side chain -COOH groups of Asp and GIu do not dissociate and thus do not influence the peptide net charge.
In a further embodiment, the N-terminal α-NH2 groups of proteins in the protein mixture (PM) may be blocked (e.g., acylated, preferably acetylated) to prevent their protonation under acidic conditions. Consequently, following tryptic digest of so-blocked protein mixture (PM), not only C-terminal peptides, but also the majority of or substantially all N-terminal peptides will display about zero net charge under conditions where the side chains of substantially all Arg and Lys (if not blocked) are protonated.
Hence, in an embodiment N-terminal and C-terminal peptides may be isolated or enriched from a protein mixture (PM) using steps comprising: - blocking α-NH2 groups of proteins in a mixture of proteins (PM) to prevent their protonation under acidic conditions; proteolysing the protein mixture (PM) by trypsin or trypsin-like protease to obtain a protein peptide mixture (PPM); isolating, under conditions where substantially all Arg and Lys (if not blocked) side chains are protonated, a subset of peptides from the protein peptide mixture (PPM), wherein the peptides of said subset have about zero net charge under said conditions.
Preferably, said conditions may encompass acidic conditions, more preferably pH about 4.0 or less, even more preferably pH about 3.0 or less, such as, e.g., pH between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.5. In this embodiment, the modification of -NH2 groups in the protein mixture (PM) may but need not also modify side chain primary amino groups, particularly the ε-NH2 groups of Lys. If ε-NH2 groups of Lys are modified, this can advantageously allow isolation of C-terminal peptides containing Lys.
The subset of peptides displaying about zero net charge in the above embodiments can be isolated from peptides of other net charge, particularly peptides showing about +1 net charge, using any method capable of distinguishing analytes on the basis of net charge difference. In an embodiment, said method may be ion exchange chromatography (IEC), preferably cation exchange chromatography (CEC), more preferably strong cation exchange (SCX) chromatography. Herein, the peptide subset having about zero net charge will elute faster and/or under less stringent conditions (e.g., lower ionic strength) than the peptides having about +1 net charge, thereby allowing separation of the subsets.
Advantageously, when the CEC such as SCX chromatography has adequate capacity, the separation can be performed on the entire protein peptide mixture (PPM), in a single step, and may recover the peptides of interest in a single fraction, which greatly increases the capacity, easiness of handling and throughout of the present proteomics methods. Preferably, the CEC such as SCX chromatography is columnar. In an embodiment, the eluate containing the peptides of interest may be collected and optionally further manipulated before subjecting it to further steps of the present methods. In a preferred embodiment, the eluate containing the peptides of interest may be directly (on-line) fed to a system performing the ensuing separation steps of the present method. Where the above methods isolate a mixture of N- and C-terminal peptides, a further separation step may be inserted to isolate or enrich from said mixture the subset of N- terminal peptides or the subset of C-terminal peptides.
For example, taking advantage of the fact that the α-NH2 group of the N-terminal peptides is blocked while that of the C-terminal peptides is free, the C-terminal peptides may be selectively captured by a capturing agent specific for primary amino acid groups, such as without limitation a crown ether capture agent (e.g., 18-crown-6).
Alternatively or in addition, C-terminal peptides may be distinguished from blocked N-terminal peptides based on the difference between the basicity of free α-NH2 groups (present in C- terminal peptides but blocked in N-terminal peptides) vs. the basicity of Arg and Lys side chains (present in N-terminal peptides but absent in most C-terminal peptides). In particular, α-amino groups have comparably lower pKa (pKa = -9-10) than Arg and Lys side chains (pKa = -12.48 and -10.53, respectively). Accordingly, at pH in between the pKa values of α-amino groups on the one hand and Arg and/or Lys side chains on the other hand (note that Lys side chains may be blocked), a greater fraction of the former will have lost their proton while a greater fraction of the latter will have retained their proton. The resulting net charge difference between the C-terminal and N-terminal peptides at such pH allows their separation, e.g., by ion exchange chromatography.
The invention further contemplates embodiments in which the protein mixture (PM) and/or the protein peptide mixture (PPM) are modified such as to introduce differently charged moieties selectively on the N-terminal, C-terminal and/or internal peptides. The so-generated net charge differences between the peptides allow to isolate the desired peptides from the protein peptide mixture (PPM). For example, in a non-limiting embodiment one or more positive charges may be introduced to N-terminal peptides while one and preferably two or more negative charges may be introduced to internal and C-terminal peptides. The N-terminal peptides may then be bound onto a CEC, preferably SCX column, while the remaining peptides would be found in the eluate. Subsequent elution of the N-terminal peptides would not only decrease the complexity of the peptide mixture, but may optionally also separate different fractions of the N-terminal peptides based on their propensity for ionic interactions. In addition, in cases where weak bases can be attached, such charged N-terminal peptides may be more prone to peptide fragmentation during MS analysis. In an embodiment, the N-terminal and/or C-terminal peptides isolated or enriched for as herein may be subjected to a multidimensional separation process. In a "multidimensional" separation process a sample of analytes is subjected to a sequence of two or more separation steps ("dimensions"), each of which acts upon all or a part of analytes separated in a previous separation step, wherein any two analytes resolved in a given separation step remain resolved in subsequent separation steps, and wherein the distinct separation steps resolve analytes on the basis of different physical and/or chemical properties. Typically, to realise a multidimensional separation, any or all fractions from a given separation step may be each individually resolved in a subsequent separation step. To obtain best resolution of peptide fractions from a given separation step in a subsequent separation step, the conditions in said steps are preferably orthogonal, such that peptides not resolved (i.e., recovered in same fraction) in one step will be resolved in a further step.
Typically, the present multidimensional separation process may involve 4 separation steps or less, preferably 3 separation steps or less. More preferably, the separation process is two- dimensional (2D). In an embodiment, the stages of the separation process may be coupled in an on-line system.
In an embodiment, one or more or all separation steps of the multidimensional separation process may be by chromatography. In a preferred embodiment, all separation steps may be by chromatography (multidimensional chromatography). For example, the separation process may be 4D-, preferably 3D-, more preferably 2D-chromatography. Chromatographic step(s) of the multidimensional separation process may involve suitable stationary phases, mobile phases (e.g., linear or gradient) and elution conditions known perse.
In an embodiment, the physical and/or chemical properties based on which peptides can be resolved in the distinct steps of the multidimensional separation process may be chosen from inter alia net charge, electrophoretic mobility (EPM), isoelectric point (pi), molecular size and/or ability or tendency to form certain type(s) of molecular interactions, such as, e.g., dispersive (hydrophobic) interactions, dipole-dipole polar interactions (e.g., hydrogen bonding), dipole-induced dipole polar interactions (e.g., π-π interactions) or ionic interactions.
Said properties may be evaluated using a variety of separation techniques known per se in the art, and which may constitute separation steps of the multidimensional process. For example, numerous chromatographic and electrophoretic applications exist to resolve peptides on the basis of the above described properties including inter alia reversed phase high performance liquid chromatography (RP-HPLC), hydrophobic interaction chromatography (HIC), normal-phase HPLC (NP-HPLC), hydrophilic interaction liquid chromatography (HILIC), chromatofocusing, size exclusion chromatography (SEC), ion exchange chromatography (IEC), affinity chromatography (AC), capillary gel electrophoresis (CGE), capillary zone electrophoresis (CZE), micellar electrokinetic chromatography (MEKC), free flow electrophoresis (FFE), isoelectric focusing (IEF) including capillary isoelectric focusing (CIEF), capillary isotachophoresis (CITP), capillary electrochromatography (CEC), and the like.
In an embodiment, one or more chromatographic separation steps may involve reversed phase (RP) chromatography, preferably RP liquid chromatography, more preferably RP- HPLC. Exemplary stationary phases for RP chromatography may include appropriate solid supports (e.g., porous or non-porous silica) functionalised with moieties such as but not limited to: aliphatic hydrocarbon moieties, e.g., straight, branched and/or alicyclic, saturated or unsaturated aliphatic hydrocarbon moieties of between 2 and 30 carbon atoms, preferably including straight or branched, more preferably straight, alkyl moieties, more preferably alkyl moieties having between 2 and 30 carbons, such as, e.g., about 18 (octadecyl), about 8 (octyl), 4 (butyl), 3 (propyl) or 2 (ethyl) carbon atoms; aromatic moieties, such as aryl, arylalkyl, aryloxy, heteroaryl or heteroarylalkyl groups, optionally substituted with:
• one or more electron-withdrawing substituents, such as, e.g., -COR, nitro (-NO2), fluorine (-F) or ammonium (-+NR3, -+NHR2, -+NH2R) groups, wherein R is an alkyl; preferred examples include inter alia trinitrophenyl or pentafluorophenyl moiety; or
• one or more electron-donating moieties, such as, e.g., hydroxyl (-OH), alkyloxy such as methoxy (-OMe) or amino (-NR2, -NHR) groups where R is an alkyl; preferred examples include inter alia phenyl, diphenyl, p-methoxyphenyl and 4- N,N-dimethylaminophenyl moieties;
Aromatic moieties as such or substituted can potentially add other type of interactions apart from hydrophobic interactions, such as inter alia π-π interactions). In an embodiment, one or more chromatographic separation steps may involve hydrophilic interaction chromatography (HILIC), such as disclosed by Alpert AJ 1990 (J Chromatogr 499: 177-96) and later developments thereof. Exemplary neutral polar stationary phases for HILIC chromatography may include appropriate solid supports (e.g., porous or non-porous silica) functionalised with moieties such as but not limited to: polar moieties such as, e.g., amino, diol, amide, or polyhydroxyethyl aspartamide; or zwitterionic groups (ZIC-HILIC), such as, e.g., -N+(CHa)2CH2CH2CH2SO3 " Un-derivatised silica is also a commonly used stationary phase for HILIC.
In a preferred embodiment, the multidimensional separation process comprises or consists of two-dimensional chromatography, wherein one dimension (1 st or 2nd) is RP-HPLC operated at low pH (particularly, pH less than about 5, preferably less than about 4, more preferably less than about 3, such as pH between about 0.5 and about 3 or pH between about 1 and 2.5, yet more preferably pH about 2); and the other dimension (2nd or 1st) is RP-HPLC operated at high pH (particularly, pH more than about 8, preferably more than about 9, such as pH between about 9 and about 12 or pH between about 9 and 11 , yet more preferably pH about 10). As realised, orthogonality between RP and RP chromatography may be achieved on the basis of pH difference, and often regardless of the RP-functional moiety, since high- and low- pH RP have proven to be orthogonal (see Figure 5).
In an exemplary embodiment, the multidimensional separation process comprises or consists of two-dimensional chromatography, wherein one dimension (1st or 2nd) is RP-HPLC using stationary phase functionalised with a C18 moiety operated at low pH; and the other dimension (2nd or 1st) is RP-HPLC using stationary phase functionalised with a C18 or phenyl, etc. moiety operated at high pH. Preferably, C18 column separation at low pH can be employed as the second (or ultimate) dimension, since it can be made highly compatible with a downstream MS analysis.
In another preferred embodiment, the multidimensional separation process comprises or consists of two-dimensional chromatography, wherein one dimension (1st or 2nd) is chromatography, preferably RP-HPLC, using stationary phase functionalised with a C18 moiety and the other dimension (2nd or 1st) is HILIC chromatography, preferably ZIC-HILIC. The inventors have realised that the above multidimensional modes provide highly orthogonal separation of the N- and/or C-terminal peptides of interest. Other multidimensional separation processes of interest for the invention may comprise or consist of an electrophoretic separation step (e.g., FFE, CIEF, CZE) preferably as a 1st dimension, in conjunction with chromatographic separation such as RP-HPLC or HILIC.
In an alternative, the N-terminal and/or C-terminal peptides isolated or enriched for as herein may be subjected to a one-dimensional (1 D) long-column chromatography separation. While this separation type involves a single dimension, the use of long columns in conjunction with the significantly reduced complexity of the peptide mixture (i.e., enriched for N- and/or C- terminal peptides) allows to achieve satisfactory resolution of the constituent peptides. Apart from the length of the column, separation modes suitable for the 1 D long-column chromatography include any chromatography types described above and elsewhere in this specification, such as preferably but without limitation, reversed phase high performance liquid chromatography (RP-HPLC), hydrophobic interaction chromatography (HIC), normal- phase HPLC (NP-HPLC) or hydrophilic interaction liquid chromatography (HILIC).
As used herein, "long-column chromatography" refers to columnar chromatography, preferably employing liquid mobile phase, more preferably HPLC, using a stationary phase column having length of at least 75 cm, more preferably at least 1 metre, e.g., at least 1.5 m, even more preferably at least 2 m, e.g., at least 2.5 m, and most preferably up to 3 m or even more.
Otherwise, wherein chromatography is used within the multi-dimensional separation process, such chromatography, particularly employing liquid mobile phase, more particularly HPLC, can preferably use stationary phase columns having lengths common for peptide separations, such as between about 3 cm and about 50 cm, more preferably between about 5 cm and about 30 cm, even more preferably between about 10 cm and 25 cm. It shall be however appreciated that long columns may be in principle also applicable to the present multidimensional separations.
It shall be appreciated that the invention is also directed to a device or system that is able to carry out the methods of the invention, in particular the methods as above comprising the isolation of N-terminal and/or C-terminal peptides, followed by multidimensional separation thereof, and optionally identification of peptides there from. Hence, the invention relates to a system for sorting peptides comprising: a first chromatographic column for isolating N- terminal and/or C-terminal peptides from the protein peptide mixture (PPM), and two or more downstream chromatographic columns for separating the N-terminal and/or C-terminal peptides into a plurality of fraction in a multidimensional separation process as described herein. The invention also relates to a system for sorting peptides comprising: a first chromatographic column for isolating N-terminal and/or C-terminal peptides from the protein peptide mixture (PPM), and a downstream long chromatographic column for separating the N-terminal and/or C-terminal peptides into a plurality of fraction in a 1 D long-column chromatography separation. Preferably, the first chromatographic column may be ion exchange column, more preferably a cation exchange column, even more preferably SCX column. Preferably, the system may be configured to perform any two or more or all above peptide sorting and separation steps "in-line", i.e., by directly feeding desired analytes from a previous separation element to the subsequent separation element.
These and further aspects and preferred embodiments of the invention are described in the following sections and in the appended claims.
BRIEF DESCRIPTION OF FIGURES
Figure 1 illustrates a flow-chart of a particular method for isolating peptides comprising C- terminal ends of proteins from a protein peptide mixture.
Figure 2 depicts the stoichiometry of the reaction of p-hydroxyphenylglyoxal with the guanidino group of an arginine residue in a peptide chain (R, R in this figure depict the remaining portions of the peptide chain).
Figure 3 depicts the stoichiometry of the reaction of nitromalondialdehyde (NMA) with the guanidino group of an arginine residue in a peptide chain (R, R in this figure depict the remaining portions of the peptide chain).
Figure 4 represents results obtained in an experiment for isolating C-terminal peptides.
Figure 5 illustrates orthogonality of RP separations performed at different pH. Both separations involve the combination of phenyl (1st dimension) and C18-RPLC (2nd dimension) chromatography. Vertical axis: 1st dimension; horizontal axis: 2nd dimension. In (A) both dimensions were operated at low pH; in (B) the phenyl column was operated at high pH and the C18 column at low pH, resulting in improved orthogonality.
Figure 6 illustrates 2D orthogonal separation of SCX-sorted N- and C-terminal peptides as described in the examples. The analysis of 24 first-dimension fractions is presented. Vertical axis: 1st dimension; horizontal axis: 2nd dimension. Figure 7 substantiates reproducibility of the peptide sorting and identification approach described in the examples. A triplicate experiment was performed using the same sample treated in parallel prior to depletion. 89% of the quantifiable features are present in at least 2 of the 3 samples. Figure 8 illustrates orthogonality provided by ZIC-HILIC (1st dimension) and RPLC (second dimension). Upper panel: 1st dimension, Lower panel: 2nd dimension of fractions indicated in upper panel.
DETAILED DESCRIPTION OF THE INVENTION
Unless otherwise defined, all terms used in disclosing the invention, including technical and scientific terms, have the meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. By means of further guidance, term definitions are included to better appreciate the teaching of the present invention. When specific terms are defined here below or in connection with a particular aspect or embodiment, such connotation may apply throughout this specification such as also in the context of other aspects or embodiments, unless otherwise defined.
As used herein, the singular forms "a", "an", and "the" include both singular and plural referents unless the context clearly dictates otherwise.
The terms "comprising", "comprises" and "comprised of as used herein are synonymous with "including", "includes" or "containing", "contains", and are inclusive or open-ended and do not exclude additional, non-recited members, elements or method steps.
The recitation of numerical ranges by endpoints includes all numbers and fractions subsumed within that range, as well as the recited endpoints.
The term "about" as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, is meant to encompass variations of +/-20% or less, preferably +/-10% or less, more preferably +1-5% or less, even more preferably +/-1 % or less, and still more preferably +/-0.1 % or less of and from the specified value, insofar such variations are appropriate to perform in the disclosed invention. However, it is to be understood that the value to which the modifier "about" refers is itself also specifically, and preferably, disclosed. All documents cited in the present specification are hereby incorporated by reference in their entirety.
The term "protein" as used herein refers to naturally or recombinantly produced macromolecules comprising one or more polypeptide chains, i.e., polymeric chains of amino acid residues linked by peptide bonds. The term thus encompasses monomeric proteins, as well as protein dimers (hetero- as well as homo-dimers) and protein multimers (hetero- as well as homo-multimers). Further, the term also encompasses proteins that carry one or more co- or post-expression modifications of the polypeptide chain(s), such as, e.g., glycosylation, acetylation, phosphorylation, sulfonation, methylation, ubiquitination, signal peptide removal, N-terminal Met removal, conversion of pro-enzymes or pre-hormones into active forms, etc. In addition, the term includes nascent protein chains as well as partly or wholly folded proteins, misfolded proteins, partly or wholly unfolded or denatured proteins, and may also cover coalesced or aggregated proteins, in particular where the latter are amenable to proteolysis. The term further also includes protein variants or mutants which carry amino acid sequence variations vis-a-vis a corresponding native protein, such as, e.g., amino acid deletions, additions and/or substitutions. The term contemplates both full-length proteins and protein parts, preferably naturally-occurring protein parts, that ensue from further processing of said full-length proteins. The methods of the invention are suitable for analysing individual proteins (such as, e.g., proteins isolated using SDS-PAGE or 2D-electrophoresis, etc.) as well as mixtures of proteins, including complex mixtures.
While the term encompasses proteins of all sizes and molecular weights, the methods of the invention may be preferably suited for analysing proteins or mixtures of proteins, wherein average and/or median length of the polypeptide chain(s) is at least about 20 amino acids, preferably at least about 50 amino acids, more preferably at least about 100 amino acids, a more preferably at least about 200 amino acids, or even at least about 500 amino acids or more.
The invention is particularly suitable for analysing mixtures of proteins, including complex protein mixtures. The terms "mixture of proteins" or "protein mixture" generally refer to a mixture of two or more different proteins, e.g., a composition comprising said two or more different proteins.
Preferably, a mixture of proteins to be analysed herein, such as a mixture by fragmentation of which a protein peptide mixture as used herein may be derived, may include more than about 10, preferably more than about 50, even more preferably more than about 100, yet more preferably more than about 500 different proteins, such as, e.g., more than about 1000 or more than about 5000 different proteins, or preferably even more than about 10,000 or 20,000 or 30,000 or more different proteins. An exemplary complex protein mixture may involve, without limitation, all or a fraction of proteins present in a biological sample or part thereof.
The terms "peptide" or "protein peptide" as used herein generally refer to fragments of a protein derived by fragmentation of said protein or of any one or more of its polypeptide chains, into two or more fragments. While the terms encompass peptides of any sizes and molecular weights, peptides and protein peptide mixtures preferred in the invention may have average and/or median length of less than about 200 amino acids, e.g., less than about 150 amino acids, preferably less than about 100 amino acids, e.g., less than about 90 amino acids, less than about 80 amino acids, less than about 70 amino acids or less than about 60 amino acids, and even more preferably less than about 50 amino acids, e.g., less than about 40 amino acids or less than about 30 amino acids. In further embodiments, peptides and protein peptide mixtures preferred in invention may have average and/or median length of at least about 5 amino acids, preferably at least about 10 amino acids, even more preferably at least about 15 amino acids, e.g., at least about 20 amino acids. Hence, in yet further embodiments, peptides and protein peptide mixtures preferred in the invention may have average and/or median length of between about 5 and about 200 amino acids, preferably between about 5 and about 100 amino acids, also preferably between about 10 and about 100 amino acids, even more preferably between about 10 and about 50 amino acids, e.g., between about 10 and about 40 amino acids or between about 10 and about 30 amino acids. Such peptide sizes are particularly amenable to analysis using the methods of invention. As used herein, the terms "peptide mixture" or "mixture of peptides" generally refer to a mixture of two or more different peptides, e.g., a composition comprising said two or more different peptides. The term "protein peptide mixture" generally refers to a mixture of peptides derived from a protein or from a mixture of two or more different proteins (i.e., protein mixture). In addition, the term "protein peptide mixture" may also encompass peptide mixtures that include only a portion of all peptides obtained by fragmentation of a protein or a mixture of proteins, e.g., by fragmentation of all or a part of proteins present in a biological sample. For example, said portion of peptides may be selected from said all peptides on the basis of one or more selection criteria of interest, such as, without limitation, molecular weight, net charge, hydrophilicity and/or hydrophobicity of the constituent peptides, before being subjected to the methods of the invention. For example, a protein peptide mixture may be derived from a complex mixture of proteins, such as, e.g., from all or a fraction of proteins present in a biological sample or part thereof. In a preferred embodiment, a protein peptide mixture may be thus obtained by fragmentation of all or a fraction of proteins present in and/or isolated from a biological sample after the sample has been obtained or removed from biological source. By means of example, the proteins may be fragmented so as to yield protein peptide mixtures having preferred average or median chain lengths as detailed above. It can be expected that, depending on the number of different proteins subjected to the fragmentation, their average or median size and the incidence of fragmentation thereof, the resulting protein peptide mixtures may comprise easily up to 1.000, 5.000, 10.000, 20.000, 30.000, 50.000, 100.000, 200.000, 300.000 or more different peptides.
However, in particular cases, the protein peptide mixture can also originate directly from a biological sample. For example, it is known that urine comprises, besides proteins, a very complex peptide mixture resulting from proteolytic degradation of proteins in the body and elimination of the resulting peptides via the kidneys. Yet another illustration is the mixture of peptides present in the cerebrospinal fluid. Accordingly, in embodiments, the method may employ protein peptide mixtures obtained from biological samples without further fragmentation in vitro.
The terms "biological sample" or "sample" as used herein generally refer to material, in a non- purified or purified form, obtained from a biological source. By means of example and not limitation, samples may be obtained from: viruses, e.g., viruses of prokaryotic or eukaryotic hosts; prokaryotic cells, e.g., bacteria or archea, e.g., free-living or planktonic prokaryotes or colonies or bio-films comprising prokaryotes; eukaryotic cells or organelles thereof, including eukaryotic cells obtained from in vivo or in situ or cultured in vitro; eukaryotic tissues or organisms, e.g., cell-containing or cell-free samples from eukaryotic tissues or organisms; eukaryotes may comprise protists, e.g., protozoa or algae, fungi, e.g., yeasts or molds, plants and animals, e.g., mammals, humans or non-human mammals. Biological sample may thus encompass, for instance, a cell, tissue, organism, or extracts thereof. A biological sample may be preferably removed from its biological source, e.g., from an animal such as mammal, human or non-human mammal, by suitable methods, such as, without limitation, collection or drawing of urine, saliva, sputum, semen, milk, mucus, sweat, faeces, etc., drawing of blood, cerebrospinal fluid, interstitial fluid, optic fluid (vitrius) or synovial fluid, or by tissue biopsy, resection, etc. A biological sample may be further subdivided to isolate or enrich for parts thereof to be used for obtaining proteins for analysing using the methods of the invention. By means of example and not limitation, diverse tissue types may be separated from each other; specific cell types or cell phenotypes may be isolated from a sample, e.g., using FACS sorting, antibody panning, laser-capture dissection, etc.; cells may be separated from interstitial fluid, e.g., blood cells may be separated from blood plasma or serum; or the like. The sample can be applied to the method directly or can be processed, extracted or purified to varying degrees before being used. The sample can be derived from a healthy subject or a subject suffering from a condition, disorder, disease or infection. For example, without limitation, the subject may be a healthy animal, e.g., human or non-human mammal, or an animal, e.g., human or non-human mammal, who has cancer, an inflammatory disease, autoimmune disease, metabolic disease, CNS disease, ocular disease, cardiac disease, pulmonary disease, hepatic disease, gastrointestinal disease, neurodegenerative disease, genetic disease, infectious disease or viral infection, or other ailment(s).
Preferably, protein mixtures derived from biological samples may be treated to deplete highly abundant proteins there from, in order to increase the sensitivity and performance of proteomics analyses. By means of example, mammalian such as human serum or plasma samples may include abundant proteins, inter alia albumin, IgG, antitrypsin, IgA, transferrin, haptoglobin and fibrinogen, which may preferably be so-depleted from the samples. Methods and systems for removal of abundant proteins are known, such as, e.g., immuno-affinity depletion, and frequently commercially available, e.g., Multiple Affinity Removal System (e.g.,
MARS-7, MARS-14) from Agilent Technologies (Santa Clara, California).
The term "subset of peptides" out of a protein peptide mixture denotes a fraction of the total set of peptides present in the protein peptide mixture. Such a fraction can preferably amount to 50% or less of the total set of peptides in the protein peptide mixture, e.g., to less than 40% or less than 30% of said total set of peptides, more preferably to less than 20% of said total set of peptides, e.g., less than 15% of said total set of peptides, and even more preferably to less than 10% of said total set of peptides, such as, e.g., to less than 8%, less than 6%, less than 4%, less than 2%, less than 1 %, less than 0.1 % or even less than 0.01 % of said total set of peptides in the protein peptide mixture. Advantageously, this reduced size of the subset of peptides vis-a-vis the total set of peptides in the protein peptide mixture allows to decrease the complexity of the former and thereby its accessibility to further analysis, e.g., proteomic analysis.
The term "isolating or enriching a subset of peptides out of a protein peptide mixture" generally means setting apart or separating said subset of peptides from the remaining peptides of the protein peptide mixture, such that said subset of peptides can be identified, analysed and/or recovered (e.g., in a composition or in purified form) separately from said remaining peptides of the protein peptide mixture. Hence, preferably the term "isolating" denotes a process of separating the recited peptides from other peptides of a protein peptide mixture (PPM), such that said recited peptides can be identified, quantified, analysed and/or recovered (e.g., in a composition or in purified form) separately from said other peptides. Preferably, a process of isolation may recover at least about 50%, e.g., at least about 60%, and more preferably substantially all of the recited peptides present in the protein peptide mixture (PPM). Also preferably, the group of so-isolated peptides may comprise less than about 50%, e.g., less than about 40%, more preferably less than about 30%, even more preferably less than about 20%, still more preferably less than about 10%, and yet more preferably less than about 5%, such as, e.g., less than about 4%, 3%, 2% or 1 % or even down to 0% of peptides from the protein peptide mixture (PPM) other than the recited peptides. High isolation efficiency and specificity can increase robustness of the methods and ensure more pronounced reduction in the complexity of the sample for further analysis.
A peptide bond "adjacent" to a given amino acid residue may be peptide bond which involves the Ca amino group of said amino acid residue and the Ca carboxyl group of the previous amino acid residue ("N-terminally adjacent" peptide bond), or peptide bond which involves the Ca carboxyl group of said amino acid residue and the Ca amino group of the following amino acid residue ("C-terminally adjacent" peptide bond). By means of illustration, in a sequence of residues (AA-1 )AA(AA+1 ), the peptide bond N-terminally adjacent to residue AA is indicated with an arrowhead, and the peptide bond C-terminally adjacent to residue AA is indicated with an arrow: v I
N H2-Cα(-RAA-i )C(=O) -N H-Cα(-RAA)C(=O)-N H-Cα(-RAA+i )C(=O)OH The term "fragmented preferentially at" means that the fragmentation occurs substantially only at the recited peptide bond(s). Preferably, less than 20% of peptide bonds other than the recited ones would be cleaved, e.g., less than 15%, more preferably less than 10%, e.g., less than 7%, even more preferably less than 5%, e.g., less than 4%, less than 3% or less than 2%, and most preferably less than 1 %, e.g., less than 0.5%, less than 0.1 %, or less than 0.01 % or even less.
When referring to a group of members or entities throughout this specification, "substantially all" means 70% or more, e.g., 75% or more, preferably 80% or more, e.g., 85% or more, more preferably 90% or more, even more preferably 95% or more, and most preferably at least 96%, at least 97%, at least 98%, at least 99% or even 100% of said members or entities.
The term "fragmentation" as used herein in relation to a protein refers to cleavage, preferably enzymatic or chemical cleavage, of one or more peptide bonds within said protein or within any one or more of its polypeptide chains. Fragmentation of protein mixture denotes fragmentation of proteins constituting said protein mixture. Advantageously, proteins or protein mixtures may be fragmented so as to yield protein peptide mixtures having the preferred average or median chain lengths as detailed above.
When a protein or a polypeptide chain is cleaved at least at one peptide bond, such fragmentation generates a peptide that comprises the N-terminal end of said protein or polypeptide chain ("N-terminal peptide") and a peptide that comprises the C-terminal end of said protein or polypeptide chain ("C-terminal peptide"). Where the protein or polypeptide chain is cleaved at two or more of its peptide bonds, such fragmentation additionally produces one or more peptides derived from the portion of the protein or polypeptide chain interposed between the parts corresponding to the N- and C-terminal peptides ("internal peptides").
Preferably, fragmentation as intended herein of the protein (P) or the mixture of proteins (PM) to achieve a protein peptide mixture (PPM) may be effected by suitable physical, chemical and/or enzymatic agents, more preferably chemical and/or enzymatic agents, even more preferably enzymatic agents, e.g., proteinases, preferably endoproteinases. Preferably, the fragmentation may be achieved by one or more, preferably one, protease (proteolytic enzyme), more preferably by one or more, preferably one, endoprotease (endopeptidase, proteinase, endoproteinase), i.e., a protease cleaving internally within a polypeptide chain. A non-limiting list of endoproteinases suitable for such fragmentation includes endoproteinases selected from serine proteinases (EC 3.4.21 ), threonine proteinases (EC 3.4.25), cysteine proteinases (EC 3.4.22), aspartic acid proteinases (EC 3.4.23), metalloproteinases (EC 3.4.24) and glutamic acid proteinases.
By means of example not limitation, protein fragmentation may be achieved using trypsin, chymotrypsin, elastase, Lysobacter enzymogenes endoproteinase Lys-C, Staphylococcus aureus endoproteinase GIu-C (endopeptidase V8) or Clostridium histolyticum endoproteinase Arg-C (clostripain). Table 1 lists specificities of these exemplary endoproteases.
Table 1
Figure imgf000030_0001
The invention encompasses the use of any further known or yet to be identified enzymes; a skilled person can choose suitable protease(s) on the basis of their cleavage specificity to achieve desired protein peptide mixtures of the invention.
Preferably, the fragmentation as intended herein may be effected by endopeptidases of the trypsin type (EC 3.4.21.4), preferably trypsin, such as, without limitation, preparations of trypsin from bovine pancreas, human pancreas, porcine pancreas, recombinant trypsin, Lys- acetylated trypsin, etc. Trypsin cleaves highly specifically peptide bonds C-terminally adjacent to arginine and lysine residues (except where the following residue is Pro), and also cleaves C-terminally adjacent to homoarginine residues, albeit at a slower rate. The invention also contemplates the use of any trypsin-like protease, i.e., with a similar specificity to that of trypsin. Trypsin is particularly useful in proteomics applications, inter alia due to high specificity and efficiency of its cleavage. In other embodiments, chemical reagents may be used to fragment proteins into peptides. For example, CNBr can fragment proteins at Met; BNPS-skatole can fragment at Trp. Alternatively, chemical fragmentation can also be achieved by limited protein hydrolysis under acidic conditions.
The conditions for treatment, e.g., protein concentration, enzyme or chemical reagent concentration, pH, buffer, temperature, time, can be determined by the skilled person depending on the enzyme or chemical reagent employed. As used herein, the term "reacting" generally refers to bringing together of designated reactants under conditions that allow a desired chemical transformation, such that compound(s) different from the reactants initially introduced to the reaction are generated.
As used herein, the term "chromatography" encompasses methods for separating chemical substances, referred to as such and vastly available in the art. In a preferred approach, chromatography refers to a process in which a mixture of chemical substances (analytes) carried by a moving stream of liquid or gas ("liquid phase" or "mobile phase") is separated into components as a result of differential distribution of the solutes or analytes, as they flow around or over a stationary liquid or solid phase ("stationary phase"), between said liquid or mobile phase and said stationary phase. The stationary phase may be usually a finely divided solid, a sheet of filter material, or a thin film of a liquid on the surface of a solid, or the like. Chromatography is also widely applicable for the separation of chemical compounds of biological origin, such as, e.g., amino acids, proteins, fragments of proteins, peptides, phospholipids, steroids, etc. Exemplary types of chromatography include, without limitation, high-performance liquid chromatography (HPLC), normal phase chromatography (NP) such as NP-HPLC, reversed phase chromatography (RP) such as RP-HPLC, ion exchange chromatography, such as cation or anion exchange chromatography, hydrophilic interaction chromatography (HILIC), hydrophobic interaction chromatography (HIC), size exclusion chromatography (SEC) including gel filtration chromatography or gel permeation chromatography, chromatofocusing, affinity chromatography such as immuno-affinity and immobilised metal affinity chromatography. While particulars of these chromatography types are well known in the art, for further guidance see, e.g., Meyer M., 1998, ISBN: 047198373X and Cappiello et al. 2001 (Mass Spectrom Rev 20: 88-104), incorporated herein by reference. In preferred embodiments, the chromatography may be columnar, i.e., wherein the stationary phase is deposited or packed in a column.
The migration of solutes in chromatography can be expressed as "elution time" or "retention time", being the time elapsed between the start of the chromatographic separation and the moment at which a solute of interest emerges at a given distance along the stationary phase where it is detected and/or collected.
The term "basic amino acid" generally refers to amino acids, preferably α-amino acids, more preferably α-L-amino acids, such as present in proteins, wherein the pKA of the protonated form of their side chain (i.e., pKR) is about 4 or greater (> 4), preferably about 5 or greater (> 5), and more preferably of about 6 or greater, e.g., about 8 or greater (> 8) or about 10 or greater (preferably, e.g., > 9 or > 10). Particularly preferred basic amino acids include lysine, arginine, histidine and homoarginine, preferably Lys, Arg and His. Generally, the side chain of basic amino acids comprises a basic moiety, such as, e.g., the imidazole moiety of histidine, the ε-amino moiety of lysine or the guanidino moiety of arginine and homoarginine.
The term "net charge" refers to the arithmetic sum of the charges of all the atoms taken together for a molecule. The term "about zero net charge" encompasses zero net charge but may also encompass small deviations therefrom, such as charges between -0.2 and +0.2, more preferably between -0.1 and +0.1 or even more preferably between -0.05 and +0.05.
The term "majority" is synonymous with "substantially all" as defined herein and refers to 70% or more, e.g., 75% or more, preferably 80% or more, e.g., 85% or more, more preferably 90% or more, even more preferably 95% or more, and most preferably at least 96%, at least 97%, at least 98%, at least 99% or even 100%. The terms "dissociated" and "protonated" refer to ionisation states of an atom or a moiety, wherein "dissociation" denotes the loss of H+, such as by acidic moieties and "protonation" denotes acceptance of H+, in particular by basic moieties.
The term "strong cation exchange" or "strong acid cation exchange" or "SCX" chromatography refers to cation exchange chromatography (preferably columnar chromatography or solid phase extraction techniques inter alia SCX using solid phase extraction cartridges, magnetic or centrifugable SCX beads, etc.) using strong acid cation exchange resins, as well-known in the art, preferably using a stationary phase that maintains constant net negative charge in the range of pH about 2-12, preferably about 1-14, or even substantially irrespective of pH. For example, SCX stationary phase may include solid support functionalised with strong acidic groups, such as preferably sulphonic acid groups. By means of example and not limitation, such resins may be of gelular or macroporous type and may contain strong acidic groups, such as preferably sulphonic acid groups, in the free acid (H- form) or neutralised (salt-form, for example, sodium or potassium salts) state. A non-limiting example hereof may be, e.g., wide-pore silica packing with a bonded coating of hydrophilic polymer, e.g., poly(2-sulfoethyl aspartamide); see, e.g., Crimmins et al. 1988 (J Chromatogr 443: 63-71 ). Typically, elution of solutes in SCX chromatography can be achieved with salt solutions, such as, e.g., NaCI, KCI or (NhU)2SO4 gradients. Commercially available SCX columns may be used herein, such as without limitation ones summarised in Table 2:
Table 2
Figure imgf000033_0001
The term "strong anion exchange" or "SAX" chromatography generally refers to anion exchange chromatography (preferably columnar chromatography or solid phase extraction techniques inter alia SAX using solid phase extraction cartridges, magnetic or centrifugable SAX beads, etc.), using a stationary phase that maintains constant net positive charge in the range of pH about 2-12, preferably about 1-14, or even substantially irrespective of pH. Non- limiting example of SAX stationary phase include solid supports functionalised with quaternary ammonium groups, such as inter alia - CH2CH2N+(CH2CH3)2CH2CH(OH)CH3.
Whenever the term "substituted" is used herein, it means that one or more hydrogens on the atom (typically a C-, N-, O- or S-atom, usually a C-atom) indicated by the modifier "substituted" is replaced with a selection from the specified group, provided that the indicated atom's normal valence is not exceeded, and that the substitution results in a chemically stable compound, i.e., a compound that is sufficiently robust to survive preparation and/or isolation to a useful degree of purity. The term "one or more" covers the possibility of all the available atoms, where appropriate, to be substituted, preferably, one, two or three. When any variable, e.g., halogen or alkyl, occurs more than one time in any constituent, each definition is independent. As used herein, "-" when in between two atoms, indicates a singe bond between the said atoms; "=" when in between two atoms, indicates a double bond between the said atoms.
As used herein, "-" when projecting from an atom of a substituting radical, indicates point(s) of attachment of the said substituting radical to the atom being substituted with the said radical. The term "single bond" refers to the direct joining by a single covalent bond of the substituents flanking (preceding and succeeding) the variable taken as a "single bond". In particular, where the variable X in formula (I) is said to be a single bond, this refers to direct joining by a single covalent bond of the two carbon atoms flanking X in formula (I).
The term "alkyl" as used herein alone or as part of another group refers to a (preferably monofunctional) saturated straight or branched hydrocarbon radicals. Exemplary alkyl radicals include inter alia Ci-2O alkyls, d_i0 alkyls, Ci-6 alkyls or C1-4 alkyls, such as without limitation methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, octyl, octadecyl and the like.
The term "C1-6 alkyl", alone or as part of another group, means a mono-functional (monovalent) saturated branched or un-branched hydrocarbon radical of between 1 and 6, e.g., 1 , 2, 3, 4, 5 or 6, carbon atoms. Ci_6 alkyl radicals encompass, for example, methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, isoamyl, and the like.
The term "C1-4 alkyl", alone or as part of another group, means a mono-functional (monovalent) saturated branched or un-branched hydrocarbon radical of between 1 and 4, e.g., 1 , 2, 3 or 4, carbon atoms. C1-4 alkyl radicals encompass, for example, methyl, ethyl, n- propyl, isopropyl, n-butyl, isobutyl, sec-butyl and tert-butyl radicals, and the like. The term "Ci-4 haloalkyl", alone or as part of another group, refers to C1-4 alkyl radical as defined herein in which at least one hydrogen atom on the C1-4 alkyl radical is replaced by a halogen atom, preferably -F, -Cl, -Br or -I, more preferably -F, -Cl or -Br, even more preferably -F or -Cl.
The term "Ci-4 perhaloalkyl", alone or as part of another group, refers to Ci-4 alkyl radical as defined herein in which all hydrogen atoms on the Ci-4 alkyl radical are replaced by same or different halogen atoms, preferably -F, -Cl, -Br or -I, more preferably -F, -Cl or -Br, even more preferably -F or -Cl.
The term "Ci-4 alkylene", alone or as part of another group, means a bi-functional (bivalent) saturated branched or un-branched hydrocarbon radical of between 1 and 4, e.g., 1 , 2, 3 or 4, carbon atoms. Ci-4 alkylene radicals encompass, e.g., methylene, ethylene, propylene, methylethylene, butylene, and the like.
The term " C3_8 cycloalkyl", alone or as part of another group, means a mono-functional (monovalent) saturated or partially unsaturated, monocyclic, bi-cyclic or polycyclic hydrocarbon radical wherein each cyclic moiety contains between 3 and 8, e.g., 2, 3, 4, 5, 6, 7 or 8, carbon atoms. A partially unsaturated C3_8 cycloalkyl radical contains at least one double bond in at least one of its cyclic moieties. C3-β cycloalkyl radical may be preferably monocyclic or bi-cyclic, more preferably monocyclic. Examples of monocyclic C3_8 cycloalkyl radicals include, without limitation, cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl, cycloheptyl, cyclooctyl, cyclopentenyl, cyclohexenyl, and the like.
The term "aryl", alone or as part of another group, means a mono-functional (monovalent), monocyclic, bi-cyclic or tri-cyclic aromatic hydrocarbon radical. Preferably, each cyclic moiety of an aryl radical may contain 6 carbon atom ring members. Preferably, an aryl radical is a phenyl, diphenyl or naphthyl radical, more preferably phenyl radical. Naphthyl radical encompasses, e.g., 1- or 2-naphthyl radicals.
The term "heterocyclyl", alone or as part of another group, means a mono-functional (monovalent), monocyclic, bi-cyclic or polycyclic, saturated or partially unsaturated, heterocyclic radical. Preferably, each cyclic moiety of a heterocyclyl radical contains between 3 and 12 ring members, more preferably between 5 to 10 ring members and still more preferably 5 to 6 ring members. At least one cyclic moiety of a heterocyclyl radical, preferably more than one and more preferably each cyclic moiety of the heterocyclyl radical, contains one or more heteroatom ring members selected from nitrogen, oxygen or sulphur. Exemplary heterocyclyl radicals include, without limitation, mono-functional radicals of dihydropyrrole, tetrahydropyrrole, dihydrofuran, tetrahydrofuran, dihydrothiophene, tetrahydrothiophene, piperidine, pyran, dihydropyran, tetrahydropyran, piperazine, oxazine, dioxane, dithiane, and the like. The term "heteroaryl", alone or as part of another group, means a (preferably mono- functional, i.e., monovalent) monocyclic, bi-cyclic or tri-cyclic aromatic heterocyclic radical. Preferably, each cyclic moiety of a heteroaryl radical contains between 3 and 12 ring members, more preferably between 5 to 10 ring members and still more preferably 5 to 6 ring members. At least one cyclic moiety of a heteroaryl radical, preferably more than one and more preferably each cyclic moiety of the heteroaryl radical, contains one or more heteroatom ring members selected from nitrogen, oxygen or sulphur. Exemplary heteroaryl radicals include, without limitation, mono-functional radicals of pyridine, pyrrole, imidazole, pyrazole, oxazole, thiazole, furan, pyridazine, pyrimidine, pyrazine, thiophene, and the like.
Where the reference to a single group contains two or more radical names, this denotes, alone or as part of another group, a radical as defined by the last-named radical, wherein at least one hydrogen atom on said last-named radical is replaced by the previous-named radical. In a general example, the group "radical 2 radical 1" depicts, alone or as part of another group, the radical 1 , in which at least one hydrogen atom on the radical 1 is replaced by radical 2. For example, by virtue of illustration only, "aryl C1-4 alkyl", alone or as part of another group, means a C1-4 alkyl radical as defined herein, in which at least one hydrogen atom on the C1-4 alkyl radical is replaced by an aryl radical as defined herein. The following terms define further radicals, in each instance either alone or as part of another group: "Ci_6 alkoxy" or "C-ι_6 alkyloxy" means a radical of the formula -0-C1-3 alkyl, wherein d-6 alkyl is as defined herein; "C1-4 alkoxy" or "C1-4 alkyloxy" means a radical of the formula - 0-C1-4 alkyl, wherein C1-4 alkyl is as defined herein; "aryloxy" means a radical of the formula - O-aryl, wherein aryl is as defined herein; "heteroaryloxy" means a means a radical of the formula -O-heteroaryl, wherein heteroaryl is as defined herein; "heterocyclyloxy" means a radical of the formula -O-heterocyclyl, wherein heterocyclyl is as defined herein; "C1-3 alkylthio" means a radical of the formula -S-C1-3 alkyl, wherein C1-3 alkyl is as defined herein; "C1^ alkylthio" means a radical of the formula -S-C1-4 alkyl, wherein C1-4 alkyl is as defined herein; "arylthio" means a radical of the formula -S-aryl, wherein aryl is as defined herein; "heteroarylthio" means a radical of the formula -S-heteroaryl, wherein heteroaryl is as defined herein; "heterocyclylthio" means a radical of the formula -S-heterocyclyl, wherein heterocyclyl is as defined herein; "C1-3 alkylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with a C1-3 alkyl as defined herein; "C1-4 alkylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with a C1-4 alkyl as defined herein; "arylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with aryl as defined herein; "heteroarylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with heteroaryl as defined herein; "heterocyclylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with heterocyclyl as defined herein; "C1-3 alkanoyl" means an acyl radical of the formula -C(=O)C1_6 alkyl, wherein C1-3 alkyl is as defined herein; "C1-6 alkanoyloxy" means a radical of the formula -OC(=O)C-ι-6 alkyl, wherein C1-3 alkyl is as defined herein; "C1-4 alkanoyl" means an acyl radical of the formula -C(=O)C1-4 alkyl, wherein C1-4 alkyl is as defined herein; "C1-4 alkanoyloxy" means a radical of the formula -OC(=O)C1-4 alkyl, wherein C1-4 alkyl is as defined herein; "aroyl" means an acyl radical of the formula - C(=O)aryl, wherein aryl is as defined herein; "aroyloxy" means a radical of the formula - OC(=O)aryl, wherein aryl is as defined herein; "heteroaroyl" means an acyl radical of the formula -C(=O)heteroaryl, wherein heteroaryl is as defined herein; "heteroaroyloxy" means a radical of the formula -OC(=O)heteroaryl, wherein heteroaryl is as defined herein; "heterocyclylcarbonyl" means an acyl radical of the formula -C(=O)heterocyclyl, wherein heterocyclyl is as defined herein; "heterocyclylcarbonyloxy" means a radical of the formula - OC(=O)heterocyclyl, wherein heterocyclyl is as defined herein; "Ci_6 alkanoylthio" means a radical of the formula -S-d_6 alkanoyl, wherein Ci-6 alkanoyl is as defined herein; "C1-4 alkanoylthio" means a radical of the formula
Figure imgf000037_0001
alkanoyl, wherein C1-4 alkanoyl is as defined herein; "aroylthio" means a radical of the formula -S-aroyl, wherein aroyl is as defined herein; "heteroaroylthio" means a radical of the formula -S-heteroaroyl, wherein heteroaroyl is as defined herein; "heterocyclylcarbonylthio" means a radical of the formula - S-heterocyclylcarbonyl, wherein heterocyclylcarbonyl is as defined herein; "C1-6 alkanoylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with a Ci-6 alkanoyl as defined herein; "Ci-4 alkanoylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with a Ci-4 alkanoyl as defined herein; "aroylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with aroyl as defined herein; "heteroaroylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with heteroaroyl as defined herein; "heterocyclylcarbonylamino" means a secondary or tertiary amino radical, wherein a hydrogen atom of the amino radical is replaced with heterocyclylcarbonyl as defined herein. For the benefit of further guidance, the following lists the radicals (in parentheses) depicted by the respective herein used terms, when used alone or as part of another group: "hydrogen" (- H); "hydroxy" or "hydroxyl" (-OH); "oxo" (=0); "formyl" (-C(=O)H); "carboxy" or "carboxyl" (interchangeably, -C(=O)OH or -C(=0)0~); "mercapto" (-SH); "fluoro" (-F); "chloro" (-Cl); "bromo" (-Br); "amino" (-NH2 or NH3 +); "substituted amino" (-NRaRb wherein Ra and Rb are each independently hydrogen or Ci-4 alkyl); nitro (-NO2); "Ci-4 alkyloxycarbonyl" (-C(=0)-0- Ci-4 alkyl); "peroxy" (-0-0-).
As explained, the first group of aspects of the invention is concerned around a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of: (a) fragmenting the protein (P) or the mixture of proteins (PM) preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types (X1, X2,... Xπ) to obtain the protein peptide mixture (PPM); and
(b) isolating a subset (S) of peptides from said protein peptide mixture (PPM), comprising the steps of: reacting the protein peptide mixture (PPM) with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xπ, and isolating from the so reacted protein peptide mixture the subset (S) of peptides unaltered by said reacting.
In the present method, the protein (P) or the mixture of proteins (PM) is fragmented preferentially at peptide bonds C-terminally adjacent to one or more specific amino acid residue types (denoted as X1, X2,... Xπ) to obtain the protein peptide mixture (PPM).
When the protein (P) or the protein mixture (PM) is fragmented preferentially C-terminally adjacent to one or more select types of amino acid residues X1, X2,... Xπ, this can create a distinction between peptides derived from N-termini and internal portions of the protein (P) or the protein mixture (PM), which will contain the one or more amino acid residue X1, X2,... or Xπ as their last residue, and peptides comprising the C-terminal ends from the protein (P) or the protein mixture (PM), most of which will not contain the one or more amino acid residues X1, X2,... or Xπ (or at least not as their last residue). As disclosed herein, said distinction allows to separate and isolate the C-terminal peptides of the protein (P) or protein mixture (PM). In addition, such preferential fragmentation at specific peptide bonds may allow to predict, e.g., in silico, the resulting peptides and relevant properties thereof, e.g., charge, size, molecular weight, etc. This information can be use to identify the peptides, in particular the isolated peptides comprising the C-terminal ends from the protein (P) or the protein mixture (PM), and consequently deduce the identity of the proteins subjected to the fragmentation. In a preferred embodiment, the protein (P) or protein mixture (PM) may be fragmented at substantially all peptide bonds C-terminally adjacent to amino acid residues of the one or more types X1, X2,... Xπ present in said protein (P) or protein mixture (PM). Hence, the fragmentation occurs substantially quantitatively after all amino acid residues of the one or more types X1, X2,... Xπ. Most peptides comprising the C-terminal ends from the protein (P) or protein mixture (PM) will thus not comprise any of the one or more amino acid residues X1, X2,... and Xπ. In this embodiment, the method of the invention may subsequently employ either specific modification or specific removal of the one or more amino acid residue types X1, X2,... and Xπ from the obtained peptides to discriminate away N-terminal and internal peptides as taught herein. This embodiment also advantageously produces smaller-size peptides.
In another embodiment, the protein (P) or protein mixture (PM) may be fragmented at substantially all peptide bonds C-terminally adjacent to the one or more amino acid residue types X1, X2,... Xπ only when the respective residue X1, X2,... or Xπ forms a part of a specific sequence element, e.g., a sequence element of ≤10, preferably ≤7, more preferably ≤5, even more preferably ≤3 and most preferably 2 amino acids, preferably consecutive amino acids. While peptides comprising the C-terminal ends from the protein (P) or protein mixture (PM) may thus comprise the one or more amino acid residue X1, X2,... and/or Xπ, in most C- terminal peptides such residue(s) will not be the last residue. In this embodiment, the method of the invention may subsequently employ specific removal (e.g., preferably by a carboxypeptidase) of the one or more amino acid residues X1, X2,... and Xπ from the obtained peptides to discriminate away N-terminal and internal peptides, as taught herein. In preferred embodiments, the protein (P) or the mixture of proteins (PM) may be fragmented at peptide bonds C-terminally adjacent to less then 5 (n<5), more preferably less than 4 (n<4), even more preferably less than 3 (n<3), still more preferably less than 2 (n<2) different types of amino acid residues X1, X2,... Xπ, and most preferably C-terminally adjacent to only 1 (n=1 ) amino acid residue type X1. This reduces the probability that the actual C-terminal residue of the analysed proteins would be any of said one or more residue types X1, X2,... or Xn, and thus decreases the fraction of true C-terminal peptides not recovered by the present method.
The one or more specific amino acid residue types X1, X2,... Xπ downstream of which fragmentation is contemplated herein may be selected from any amino acid residues, including, but not limited to amino acids found in naturally occurring proteins, amino acids carrying a post-translational modification, amino acids including a non-natural isotope, or amino acids further chemically and/or enzymatically altered prior to the fragmentation, etc.
For example, in an embodiment, the protein (P) or the mixture of proteins (PM) can be fragmented preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types X1, X2,... Xπ, wherein said one or more amino acid residue types X1, X2,... Xπ are chosen from the group consisting of: GIy, Pro, Ala, VaI, Leu, lie, Met, Cys, Phe, Tyr, Trp, His, Lys, Arg, GIn, Asn, GIu, Asp, Ser, Thr, homoarginine, 4-hydroxyproline, ε-N,N,N- trimethyllysine, 3-methylhistidine, 5-hydroxylysine, O-phosphoserine, γ-carboxyglutamate, ε- N-acetyllysine, ω-N-methylarginine, N-acetylserine, N,N,N-trimethylalanine, citrulline, ornithine and homocysteine.
Preferably, to obtain relatively frequent fragmentation, said one or more amino acid residue types X1, X2,... Xπ may be chosen from the 20 amino acid types commonly found in natural proteins: GIy, Pro, Ala, VaI, Leu, lie, Met, Cys, Phe, Tyr, Trp, His, Lys, Arg, GIn, Asn, GIu, Asp, Ser or Thr.
In a further embodiment, to ensure that peptides comprising the one or more amino acid residue types X1, X2,... or Xπ can be modified by agents of the invention, said amino acid residue types X1, X2,... Xπ may be preferably selected from amino acid types whose side chain comprises comparably reactive moieties.
Accordingly, in a preferred embodiment, the one or more amino acid residue types X1, X2,... Xπ may comprise, in its side chain, a moiety chosen from mercapto (-SH); alkylthio (-S- alkyl), preferably Ci-3 alkylthio, more preferably methylthio or ethylthio, most preferably methylthio; hydroxyphenyl (-C6H4OH), more preferably p-hydroxyphenyl; primary amino (- NH2), secondary amino (-NH-), preferably indyl, pyrrolidinyl or imidazyl, more preferably indyl or imidazyl; guanidino (-NH-C(=NH)NH2); ureyl (-NH-C(=O)NH2); and carboxyl (-COOH). In a further preferred embodiment, the one or more amino acid residue types X1, X2,... Xπ may comprise a moiety chosen from mercapto; methylthio; p-hydroxyphenyl; primary amino; indyl; imidazyl; guanidino; and carboxyl. In further preferred embodiments, the one or more amino acid residue types X1, X2,... Xn may comprise a moiety chosen from mercapto and methylthio; or chosen from primary amino, indyl, imidazyl and guanidino; or chosen from primary amino and guanidino; or chosen from guanidino; or chosen from p-hydroxyphenyl; or chosen from carboxyl. Hence, in further embodiments, the one or more amino acid residue types X1, X2,... Xπ may be chosen from the group consisting of Met, Cys, Tyr, Trp, His, Lys, hArg, Arg, GIu and Asp; or the group consisting of Met, Cys, Tyr, Trp, His, Lys, Arg, GIu and Asp; or the group consisting of His, Lys and Arg; or the group consisting of Lys, Arg and hArg; or the group consisting of Lys and Arg; or the group consisting of Arg and hArg; or the group consisting of Met and Cys; or the group consisting of Tyr and Trp; or the group consisting of Asp and GIu. The present method further comprises reacting the protein peptide mixture (PPM) with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xn.
In the methods of the invention, the protein peptide mixture (PPM) or portion thereof may be reacted with a suitable agent under conditions allowing said agent to react with and desirably modify or alter the one or more amino acid residues X1, X2,... Xπ in peptides comprising such. Preferably, as a result of said reacting, peptides that comprise said one or more amino acid residues X1, X2,... or Xπ may be modified to form an adduct between the one or more amino acid residues X1, X2,... or Xn and one or more molecules of the modifying agent. More preferably, the resulting adduct is stable, such that it persists in substantially all so modified peptides, and more preferably in substantially all so modified amino acid residues X1, X2,... Xπ, under the physical (e.g., temperature, light) and chemical (e.g., pH, ionic strength, solvents) conditions used in subsequent steps of the method of the invention, such as, e.g., in the step employed to separate the unaltered peptides from the altered peptides. The term "specifically modify" means that as a result of said reacting, peptides that comprise the one or more amino acid residues X1, X2,... or Xπ will be modified by the agent, whereas of peptides that do not include the one or more amino acid residues X1, X2,... or Xn substantially all will remain unmodified by said agent. Such specificity may be achieved, e.g., when the modifying agent at the conditions of the reaction reacts with the one or more amino acid residues X1, X2,... or Xπ but substantially not with any other type of amino acid residues, or when amino acid residues other than the one or more X1, X2,... or Xn, that would normally be reactive with said agent, are suitably blocked in the protein (P), protein mixture (PM) or protein peptide mixture (PPM) before said reacting.
Preferably, said reacting is quantitative, i.e., as a result of thereof, substantially all peptides that comprise the one or more amino acid residues X1, X2,... or Xπ will become modified with the modifying agent used. Moreover, as a result of said reacting, substantially all of the one or more amino acid residues X1, X2,... and Xπ present in peptides of the protein peptide mixture can become modified.
Alternatively, the protein peptide mixture (PPM) or portion thereof may be reacted with a suitable agent under conditions allowing said agent to remove the one or more amino acid residues X1, X2,... Xπ from peptides comprising such. As used herein, "removal" of a given amino acid residue from a peptide refers to cleaving, e.g., hydrolysing, of the peptide bond(s) that connect said residue to the remainder of the peptide. In the present method, the to be removed residue X1 , X2, ... or Xn will typically be the last (i.e., -COOH end) residue of the respective peptides, and the removal of said residue X1 , X2,... or Xn will involve cleavage of the peptide bond N-terminally adjacent to said residue X1 , X2, ... or Xπ. The term "specifically remove" means that the agent will remove the one or more amino acid residue types X1, X2, ... or Xπ from peptides comprising such, but of peptides that do not include the one or more amino acid residues X1 , X2,... or Xn substantially all will not be altered by said agent. Such specificity may be achieved, e.g., when the agent at the conditions of the reaction removes the one or more amino acid residues X1, X2, ... or Xn but substantially not any other type of amino acid residues, or when amino acid residues other than the one or more X1, X2, ... or Xπ, that would normally be removed by said agent, are suitably blocked in the protein (P), protein mixture (PM) or protein peptide mixture (PPM) before said reacting.
Preferably, said reacting is quantitative, i.e., as a result of thereof, substantially all peptides of the protein peptide mixture (PPM) that comprise the one or more amino acid residues X1, X2, ... or Xπ will become altered with the agent used. Moreover, as a result of said reacting, substantially all of the one or more amino acid residues X1, X2, ... and Xπ present in peptides of the protein peptide mixture (PPM) can be removed.
The terms "alter", "altering", "altered" or "alteration" as used herein in relation to a peptide refer to the introduction of a specific change to said peptide by reacting the peptide with agents of the invention as defined herein. Such alteration may involve a specific chemical and/or enzymatic modification to or removal of one or more amino acids of a peptide. Preferably, introduction of said specific alteration allows to subsequently distinguish and/or separate the altered and unaltered peptides by suitable methods, such as, e.g., by chromatography. The invention contemplates any suitable agent capable of specifically modifying the one or more amino acid residue types X1, X2, ... Xπ in peptides of the protein peptide mixture (PPM), as taught herein. Without limitation, the ensuing exemplary embodiments can serve as further guidance to selection of suitable agents:
- when one of the one or more amino acid residues X1, X2, ... Xπ is a residue comprising a side chain primary amino group, such as, e.g., Lys, 5-hydroxylysine or ornithine, preferably Lys, exemplary modifying agents may comprise 1 -fluoro-2,4-dinitrobenzene (e.g., reacting to dinitrophenyl-Lys), trinitrobenzene sulphonic acid (e.g., reacting to trinitrophenyl-Lys), ethylthiotrifluoroacetate, (e.g., reacting to trifluoroacetyl-Lys) or succinyl anhydride (e.g., reacting to succinyl-Lys); and preferably O-methylisourea (e.g., guanidinylation of the side chain -NH2 groups) which preferentially modifies the side chain -NH2 groups and does not substantially modify α-NH2 groups of peptides;
- when one of the one or more amino acid residues X1, X2,... Xπ is a residue comprising a side chain guanidino group, such as, e.g., Arg or hArg, preferably Arg, exemplary modifying agents may comprise a dicarbonyl compound or derivative thereof, a peptidylarginine deiminase or an arginase; - when one of the one or more amino acid residues X1, X2,... Xπ is a residue comprising a side chain mercapto group, such as, e.g., Cys or homocysteine, exemplary modifying agents may comprise iodoacetate (e.g., reacting to S-carboxymethyl-Cys), 1-fluoro-2,4- dinitrobenzene (e.g., reacting to S-dinitrophenyl-Cys), N-ethylmaleimide, p- hydroxymercuribenzoate, 5,5'-dithiobis(2-nitrobenzoic acid), or performic acid (e.g., reacting to cysteic acid); when one of the one or more amino acid residues X1, X2,... Xπ is a residue comprising an side chain alkylthio group, preferably methylthio group, such as, e.g., Met, exemplary modifying agents may comprise cyanogen bromide (reacting to peptidyl homoserine lactone), iodoacetate (reacting to S-carboxymethyl-Met) or performic acid (reacting to methionine sulphone);
- when one of the one or more amino acid residues X1, X2,... Xπ is a residue comprising a side chain carboxyl group, such as, e.g., Asp, GIu or γ-carboxyglutamate, preferably Asp or GIu, exemplary modifying agents may comprise diazomethane (reacting to methyl ester) or glycine methyl ester (reacting to an amide); - when one of the one or more amino acid residues X1, X2,... Xn is a residue comprising an side chain indyl group, such as, e.g., His or 3-methylhistidine, preferably His, exemplary modifying agents may comprise iodoacetate or diethylpyrocarbonate (reacting to ethylcarboxamido-His); when one of the one or more amino acid residues X1, X2,... Xπ is a residue comprising an side chain imidazyl group, such as, e.g., preferably Trp, exemplary modifying agents may comprise 2, 4-dinitrophenylsulphenyl chloride or N-bromosuccinimide; - when one of the one or more amino acid residues X1, X2,... Xπ is a residue comprising a side chain hydroxyphenyl group, such as, e.g., Tyr, exemplary modifying agents may comprise tetranitromethane (reacting to 3-nitrotyrosine).
The invention further also contemplates any suitable agent capable of specifically removing the one or more amino acid residue types X1, X2,... Xπ from peptides of the protein peptide mixture (PPM), as taught herein. Without limitation, the following exemplary embodiments can serve as further guidance to selection of such suitable agents: when one of the one or more amino acid residues X1, X2,... Xπ is Arg, Lys or hArg, preferably Arg or Lys, and said residue is the last (i.e., -COOH end) residue of a peptide, exemplary agents to specifically remove said residue from the peptide may comprise carboxypeptidase B (EC 3.4.17.2; see, e.g., Folk 1970. Methods Enzymol 19: 504-508), carboxypeptidase U (EC 3.4.17.20; see, e.g., Eaton et al. 1991. J Biol Chem 266: 21833- 21838) or carboxypeptidase D (EC 3.4.16.6). Carboxypeptidase B, also known as protaminase or pancreatic carboxypeptidase B, has been isolated from a variety of sources, such as pancreas of cattle, pig and dogfish, etc., and all its origins and forms, including any recombinantly produced forms thereof, are contemplated for use herein; when one of the one or more amino acid residues X1, X2,... Xπ is a basic amino acid, preferably Arg or Lys, more preferably Lys, and said residue is the last residue of a peptide, exemplary agents to specifically remove said residue from the peptide may comprise carboxypeptidase N (EC 3.4.17.3; see, e.g., Plummer & Erdόs 1981. Methods
Enzymol. 80: 442-449);
- when one of the one or more amino acid residues X1, X2,... Xπ is GIu, and said residue is the last residue of a peptide, exemplary agents to specifically remove said residue from the peptide may comprise carboxypeptidase G (EC 3.4.17.11 ; see, e.g., Goldman & Levy 1967. PNAS 58: 1299-1306).
In a preferred embodiment, the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to amino acid residue(s) that comprise a guanidino moiety.
The term "guanidino", alone or as part of another group, means a guanidine radical of the formula -NHC(=NH)NH2, or any protonated or dissociated form thereof, such as, e.g., -
NHC(=NH2 +)NH2. Preferably, the fragmentation may be C-terminally adjacent to Arg and/or hArg (wherein hArg may be obtained in the proteins by suitably converting the side chains of lysine residues thereto before proteolysis), and/or C-terminally adjacent to Lys, wherein the lysine may be advantageously converted to hArg subsequent to said fragmentation. Lys can be converted to hArg by methods known in the art, such as, e.g., by guanidinylation of the ε- NH2 groups of Lys with O-methylisourea, e.g., as disclosed in Plapp et al. 1971 (J Biol Chem 246: 939-945).
Hence, in related embodiments, the protein (P) or protein mixture (PM) can be fragmented preferentially at peptide bonds C-terminally adjacent to:
- Arg residues (i.e., n=1 , X1=Arg); or - Arg and/or hArg residues, preferably Arg and hArg (i.e., n=2, X1=Arg, X2=hArg); or
Arg and/or Lys residues, preferably Arg and Lys residues (i.e., n=2, X1=Arg, X2=Lys), wherein Lys residues are advantageously subsequently modified to hArg; or
- Arg and/or hArg and/or Lys residues, preferably Arg and hArg and Lys residues (i.e., n=3, X1=Arg, X2=hArg, X3=Lys), wherein Lys residues are advantageously subsequently modified to hArg.
Such fragmentation may be advantageously achieved using trypsin type endopeptidases, preferably trypsin, which cleave highly specifically after Arg and Lys and, to a lesser extent, after hArg. If cleavage after Lys is not desired, as in some of the above embodiments, Lys residues may be suitably blocked before said proteolysis, as described elsewhere in this specification. The invention also contemplates the use of any trypsin-like protease, i.e., with a similar specificity to that of trypsin.
In the so obtained protein peptide mixture (PPM), peptides comprising a guanidino moiety will thus mainly be derived from the N-termini or internal portions of the starting protein(s), whereas peptides lacking a guanidino moiety will mostly originate from and comprise the C- terminal ends of the starting protein(s). To allow to discriminate these peptide subsets, the invention contemplates particularly advantageous manners to modify peptides that include residues with a guanidino moiety.
As noted above, the protein (P) or protein mixture (PM) may be subjected to chemical and/or enzymatic pre-treatment before fragmentation. Alternatively or in addition, the protein peptide mixture (PPM), may be subjected to chemical and/or enzymatic pre-treatment before the primary run or before reacting with an agent of the invention. For example, pre-treatments may allow broadening of the spectrum of classes of peptides which can be isolated with the invention. Alternatively, pre-treatments may alter the specificity of the fragmentation, or may block reactive groups of the peptide to prevent their reactivity under the conditions of the method, or may affect the modification of peptides with the agent of the invention, etc.
For example, the protein (P), mixture of proteins (PM) or protein peptide mixture (PPM) may be exposed to one or more blocking reagents, simultaneously or sequentially, which reagents may preferably fall into the following classes: modifiers of protein primary amines; modifiers of protein primary amines only present in amino acid side chains; or modifiers of cysteine residues.
Suitable blocking reagents, as well as methods and conditions for attaching the blocking groups will be clear to the skilled person and are generally described in the standard handbooks of organic chemistry, such as Greene and Wuts, "Protective groups in organic synthesis", 3rd Edition, Wiley and Sons, 1999, which is incorporated herein by reference in its entirety.
In a non-limiting example, ε-NH2 groups of lysine residues, but at least optionally not the N- terminal α-NH2 group, in the protein (P) or protein mixture (PM), can be converted to guanidino groups using O-methylisourea, before fragmentation C-terminally to arginine and homoarginine residues, preferably using trypsin type protease. Alternatively, lysine residues may be so-converted to homoarginine only after fragmentation C-terminally adjacent to arginine and lysine residues, e.g., using trypsin type protease. This allows the combination of the advantageous digestion with trypsin with the ability to modify substantially all non-C- terminal peptides with agents of the invention that specifically modify guanidino groups.
In another embodiment, primary amines, in particular the ε-NH2 groups of lysine and/or the N- terminal α-NH2 group, will be blocked with a blocking reagent that reacts with primary amines and presents a non-reactive substituent for subsequent steps. The blocking reagent may be generally substituted once or twice on each primary amine. In a non-limiting but preferred example, primary amines may be blocked with acetyl N-hydroxysuccinimide resulting in acetylation of the primary amines, as known in the art. Preferably, ε-NH2 groups of lysines may be so-blocked in the protein (P) or protein mixture (PM) prior to fragmentation with a trypsin-type endoprotease. Because trypsin generally does not cleave after acetylated lysines, the protein (P) or protein mixture (PM) will be fragmented only C-terminally to arginine residues. Hence, said acetylation avoids the need to guanidinylate lysine residues to homoarginine and also produces a different variety of peptides. Other agents known to modify ε-NH2 and/or α-NH2 groups in proteins may include, e.g., 1-fluoro-2,4-dinitrobenzene (FDNB), trinitrobenzene sulphonic acid, ethylthiofluoroacetate and succinic anhydride.
In yet further embodiment, the -SH groups of cysteine side chains of the protein (P) or protein mixture (PM) or of the peptides of protein peptide mixture (PPM) may be blocked, to avoid their reactivity, e.g., susceptibility to oxidation, in subsequent steps of the method. The blocking reagent can be any that reacts selectively with cysteine side chains and presents a non-reactive substituent for subsequent reactions. By means of example, and not limitation, the sample may be treated with tributylphosphine, followed by iodoacetamide in protein denaturing buffers, leading to acetamide derivatisation of cysteine-side chains; otherwise, cysteine -SH groups can be blocked by alkylation as known in the art. Other agents known to modify -SH groups of cysteine in proteins may include, e.g., 1-fluoro-2,4-dinitrobenzene (FDNB) and N-ethylmaleimide.
The treatment with a primary amine blocker may occur prior to treatment with an -SH-group blocker, or vice versa. After treatment with a blocking reagent, the resulting sample may then be optionally be purified, using techniques known in the art, such as evaporation of solvent, washing, filtration, and chromatographic techniques, such as column chromatography (e.g. disposable preparative cartridge), etc.
In a preferred embodiment, peptides of the protein peptide mixture (PPM) that comprise a guanidino moiety may be modified by reacting with a dicarbonyl compound or derivative thereof.
The term "dicarbonyl compound" generally refers to any compound, in particular an organic compound, comprising two or more carbonyl groups, wherein the term "carbonyl", alone or as part of another group, refers to a bi-functional (bivalent) group of the formula -C(=O)-. In an embodiment, the dicarbonyl compound may comprise two carbonyl groups. In exemplary, non-limiting embodiments, said at least two carbonyl groups may be two aldehyde groups
(dialdehyde), two ketone groups (diketone), or one aldehyde and one ketone group (aldehyde-ketone), or at least one of said at least two carbonyl groups may be a part of a carboxyl group (e.g., aldehyde-acid, keto-acid), an ester group (e.g., aldehyde-ester, keto- ester) or a thioester group. Preferably, the dicarbonyl compound may be a dialdehyde, a diketone or an aldehyde-ketone compound. The term "derivative" generally denotes that one or more atoms of said dicarbonyl compound may be substituted with one or more same or different functional groups.
In a preferred embodiment, said dicarbonyl compound or a derivative thereof is a molecule of the formula (I):
Figure imgf000048_0001
wherein X is chosen from the group comprising or consisting of:
- a single bond;
Ci-4 alkylene, optionally substituted with one or more hydroxy, oxo, formyl, carboxy, mercapto, fluoro, chloro, bromo, amino, substituted amino (-NRaRb wherein Ra and Rb are each independently hydrogen or C1-4 alkyl), nitro, C1-4 alkyl, C1-4 haloalkyl, C1-4 perhaloalkyl, C1^ alkoxy, Ci-4 alkanoyl, Ci-4 alkanoyloxy, C1^ alkyloxycarbonyl, aryl, heteroaryl, heterocyclyl, aryl C1-4 alkyl, heteroaryl C1-4 alkyl, or heterocyclyl C1-4 alkyl;
- -O- (oxygen); peroxy; -NH- (secondary amino) optionally substituted with C1-4 alkyl; and - S- (sulphur); wherein R1 and R2 are, each independently, chosen from the group comprising or consisting of:
- hydrogen (-H);
- d-6 alkyl, C3_8 cycloalkyl, aryl, heteroaryl, heterocyclyl, aryl Ci-4 alkyl, heteroaryl C1^ alkyl, heterocyclyl Ci-4 alkyl, C1^ alkoxy, aryloxy, heteroaryloxy, heterocyclyloxy, aryl C1^ alkoxy, heteroaryl C1^ alkoxy, heterocyclyl Ci-4 alkoxy, Ci-6 alkylthio, arylthio, heteroarylthio, heterocyclylthio, aryl C1-4 alkylthio, heteroaryl Ci_4 alkylthio, heterocyclyl Ci- 4 alkylthio, Ci_6 alkylamino, arylamino, heteroarylamino, heterocyclylamino, aryl C1-4 alkylamino, heteroaryl Ci-4 alkylamino, heterocyclyl
Figure imgf000048_0002
alkylamino, Ci-6 alkanoyloxy, aroyloxy, heteroaroyloxy, heterocyclylcarbonyloxy, aryl C1-4 alkanoyloxy, heteroaryl C1-4 alkanoyloxy, heterocyclyl C1-4 alkanoyloxy, Ci-6 alkanoylthio, aroylthio, heteroaroylthio, heterocyclylcarbonylthio, aryl Ci-4 alkanoylthio, heteroaryl Ci-4 alkanoylthio, heterocyclyl Ci-4 alkanoylthio, Ci_6 alkanoylamino, aroylamino, heteroaroylamino, heterocyclylcarbonylamino, aryl Ci-4 alkanoylamino, heteroaryl Ci-4 alkanoylamino, heterocyclyl Ci-4 alkanoylamino, each group being optionally substituted with one or more hydroxy, oxo, formyl, carboxy, mercapto, fluoro, chloro, bromo, amino, substituted amino (-NRaRb wherein Ra and Rb are each independently hydrogen or C1-4 alkyl), nitro, C1-4 alkyl, Ci-4 haloalkyl, C1^ perhaloalkyl, C1^ alkoxy, C1^ alkanoyl, C1^ alkanoyloxy, C1^ alkyloxycarbonyl, aryl, heteroaryl, heterocyclyl, aryl C1-4 alkyl, heteroaryl C1-4 alkyl, or heterocyclyl Ci-4 alkyl; hydroxy, mercapto, fluoro, chloro, bromo, amino, substituted amino (-NRaRb wherein Ra and Rb are each independently hydrogen or Ci-4 alkyl).
In an embodiment ("E1"), X is chosen from the group comprising or consisting of:
- a single bond; - C1-4 alkylene, optionally substituted with one or more Rc;
- -O-; peroxy; -NH- optionally substituted with Ci-4 alkyl; and -S-.
As used throughout this specification to define particular sets of preferred substituents, "Rc" is, each independently, hydroxy, oxo, formyl, carboxy, mercapto, fluoro, chloro, bromo, amino, substituted amino, nitro, Ci-4 alkyl, Ci-4 haloalkyl, Ci-4 perhaloalkyl, Ci-4 alkoxy, C1^ alkanoyl, C1-4 alkanoyloxy, C1-4 alkyloxycarbonyl, aryl, heteroaryl, heterocyclyl, aryl C1-4 alkyl, heteroaryl C1-4 alkyl or heterocyclyl Ci-4 alkyl; "Rd" is, each independently, hydroxy, fluoro, chloro, bromo, nitro, Ci-4 alkyl, C^4 haloalkyl, Ci-4 perhaloalkyl, Ci-4 alkoxy, C^4 alkanoyl, C1^ alkanoyloxy, aryl or heteroaryl; "Re" is, each independently, hydroxy, fluoro, chloro, nitro, C1-4 alkyl, d_4 haloalkyl or C1^ perhaloalkyl; "Rf" is, each independently, hydroxy, fluoro, chloro, nitro, C1-4 alkyl or C1-4 haloalkyl; "R9" is, each independently, hydroxy, fluoro or chloro; and "Rh" is, each independently, hydroxy, fluoro, chloro or nitro.
In the above substituents sets Rc, Rd, Re and Rf, C1-4 alkyl, alone or as a part of another group, may preferably be Ci-3 alkyl, more preferably methyl or ethyl; Ci-4 haloalkyl, alone or as a part of another group, may preferably be Ci_3 haloalkyl, more preferably halomethyl or haloethyl; Ci-4 perhaloalkyl, alone or as a part of another group, may preferably be Ci-3 perhaloalkyl, more preferably perhalomethyl or perhaloethyl; Ci-4 alkoxy, alone or as a part of another group, may preferably be Ci-3 alkoxy, more preferably methoxy or ethoxy; C1-4 alkanoyl, alone or as a part of another group, may preferably be Ci-3 alkanoyl, more preferably formyl or acetyl; Ci-4 alkanoyloxy, alone or as a part of another group, may preferably be Ci-3 alkanoyloxy, more preferably formyloxy or acetoxy; Ci-4 alkyloxycarbonyl, alone or as a part of another group, may preferably be Ci-3 alkyloxycarbonyl, more preferably methoxycarbonyl or ethoxycarbonyl; aryl, alone or as a part of another group, may preferably be phenyl or naphthyl, more preferably phenyl; the cycle or cycles of the heteroaryl may preferably have 5 or 6 ring members and/or the heteroaryl may preferably comprise one or more N- and/or one or more O- heteroatoms; the cycle or cycles of the heterocyclyl may preferably have 5 or 6 ring members and/or the heterocyclyl may preferably comprise one or more N- and/or one or more O- heteroatoms.
In a preferred embodiment ("E2"), X is a single bond.
In another preferred embodiment ("E3"), X is C1-4 alkylene, preferably C1-3 alkylene, more preferably Ci-2 alkylene, e.g., methylene or ethylene, even more preferably methylene, each group optionally substituted with one or more Rc, preferably with one or more Rd, more preferably with one or more Re, even more preferably with one or more Rf, yet more preferably with one or more Rh, and most preferably with nitro.
In a particularly preferred embodiment ("E4"), X is methylene substituted with nitro, preferably with only one nitro moiety, more preferably X is -CH(-NO2)- or any protonated or dissociated form thereof, such as, e.g., -C~(-NO2)-. In an embodiment ("E5"), Ri and R2 is chosen, each independently, from the group comprising or consisting of:
- hydrogen (-H);
- Ci-6 alkyl, C3_8 cycloalkyl, aryl, heteroaryl, heterocyclyl, aryl Ci-4 alkyl, heteroaryl C1-4 alkyl, heterocyclyl Ci-4 alkyl, d-β alkoxy, aryloxy, heteroaryloxy, heterocyclyloxy, aryl C1-4 alkoxy, heteroaryl C1-4 alkoxy, heterocyclyl Ci-4 alkoxy, Ci-6 alkylthio, arylthio, heteroarylthio, heterocyclylthio, aryl Ci-4 alkylthio, heteroaryl C1-4 alkylthio, heterocyclyl Ci- 4 alkylthio, Ci-6 alkylamino, arylamino, heteroarylamino, heterocyclylamino, aryl Ci-4 alkylamino, heteroaryl Ci-4 alkylamino, heterocyclyl
Figure imgf000050_0001
alkylamino, Ci-6 alkanoyloxy, aroyloxy, heteroaroyloxy, heterocyclylcarbonyloxy, aryl Ci_4 alkanoyloxy, heteroaryl
Figure imgf000050_0002
alkanoyloxy, heterocyclyl Ci-4 alkanoyloxy, Ci-6 alkanoylthio, aroylthio, heteroaroylthio, heterocyclylcarbonylthio, aryl Ci-4 alkanoylthio, heteroaryl Ci-4 alkanoylthio, heterocyclyl Ci-4 alkanoylthio, Ci-6 alkanoylamino, aroylamino, heteroaroylamino, heterocyclylcarbonylamino, aryl Ci-4 alkanoylamino, heteroaryl Ci-4 alkanoylamino, heterocyclyl Ci-4 alkanoylamino, each group being optionally substituted with one or more Rc, more preferably with one or more Rd, even more preferably with one or more Re, still more preferably with one or more Rf, and most preferably with one or more R9; hydroxy, mercapto, fluoro, chloro, bromo, amino, substituted amino; and X is as defined in any one of above embodiments "E1", "E2", "E3" or "E4".
In a preferred embodiment ("E6"), each R1 and R2 is hydrogen (-H); and X is as defined in any one of above embodiments "E1", "E2", "E3" or "E4", preferably "E3" or "E4", more preferably "E4". In another embodiment ("E7"), R1 and R2 is chosen, each independently, from the group comprising or consisting of: hydrogen, Ci-6 alkyl, aryl, heteroaryl, aryl C1^ alkyl, heteroaryl C1. 4 alkyl, C1^ alkoxy, aryloxy, heteroaryloxy, aryl C1^ alkoxy, heteroaryl C1^ alkoxy, C1^ alkanoyloxy, aroyloxy, heteroaroyloxy, aryl C1^ alkanoyloxy and heteroaryl C1^ alkanoyloxy, each group being optionally substituted with one or more Rc, more preferably with one or more Rd, even more preferably with one or more Re, still more preferably with one or more Rf, and most preferably with one or more R9; and X is as defined in any one of above embodiments "E1", "E2", "E3" or "E4".
In a further embodiment ("E8"), R1 and R2 is chosen, each independently, from the group comprising or consisting of: hydrogen, Ci-6 alkyl, aryl, heteroaryl, aryl C1-4 alkyl and heteroaryl C1-4 alkyl, each group being optionally substituted with one or more Rc, more preferably with one or more Rd, even more preferably with one or more Re, still more preferably with one or more Rf, and most preferably with one or more R9; and X is as defined in any one of above embodiments "E1", "E2", "E3" or "E4".
In another embodiment ("E9"), Ri and R2 is chosen, each independently, from the group comprising or consisting of: hydrogen, C1-6 alkyl, aryl and heteroaryl, each group being optionally substituted with one or more Rc, more preferably with one or more Rd, even more preferably with one or more Re, still more preferably with one or more Rf, and most preferably with one or more R9; and X is as defined in any one of above embodiments "E1", "E2", "E3", or "E4". In another embodiment ("E10"), Ri and R2 is chosen, each independently, from the group comprising or consisting of: hydrogen, aryl and heteroaryl, each group being optionally substituted with one or more Rc, more preferably with one or more Rd, even more preferably with one or more Re, still more preferably with one or more Rf, and most preferably with one or more R9; and X is as defined in any one of above embodiments "E1", "E2", "E3" or "E4". In another embodiment ("E11"), Ri and R2 is chosen, each independently, from the group comprising or consisting of: hydrogen, aryl, preferably phenyl or naphthyl, more preferably phenyl, each group being optionally substituted with one or more Rc, more preferably with one or more Rd, even more preferably with one or more Re, still more preferably with one or more Rf, and most preferably with one or more R9; and X is as defined in any one of above embodiments "E1", "E2", "E3" or "E4". In any of the above embodiments "E5", "E6", "E7", "E8", "E9", "E10" or "E11":
- d-6 alkyl, alone or as a part of another group, may preferably be C1^ alkyl, more preferably d-3 alkyl and even more preferably methyl or ethyl; and/or
- d-6 alkanoyl, alone or as a part of another group, may preferably be Ci-4 alkanoyl, more preferably d-3 alkanoyl, even more preferably formyl or acetyl; and/or - Ci-4 alkyl, alone or as a part of a group, may preferably be d_3 alkyl and more preferably methyl or ethyl; and/or
- Ci-4 alkanoyl, alone or as a part of a group, may preferably be d-3 alkanoyl, more preferably formyl or acetyl; and/or aryl, alone or as a part of a group, may preferably be phenyl or naphthyl, more preferably phenyl; and/or the cycle or cycles of said heteroaryl may have 5 or 6 ring members and/or the heteroaryl may comprise one or more N- and/or one or more O- heteroatoms; and/or the cycle or cycles of said heterocyclyl may have 5 or 6 ring members and/or the heterocyclyl may comprises one or more N- and/or one or more O- heteroatoms. In a further embodiment ("E12"), at least one of Ri and R2 is hydrogen, preferably only one of R1 and R2 is hydrogen, and the other of R1 and R2 is chosen from the groups as defined in any of the preceding embodiments applicable to substituents R1 and R2; and X is as defined in any one of above "E1", "E2", "E3" or "E4".
In a further embodiment ("E13"), neither R1 nor R2 is hydrogen; and R1 and R2 are chosen from the groups as defined in any of the preceding embodiments applicable to substituents R1 and R2; and X is as defined in any one of above "E1", "E2", "E3" or "E4".
In a preferred embodiment ("E14"), at least one of R1 and R2, such as one or both of R1 and R2, and preferably only one of R1 and R2, is chosen from the group comprising or consisting of aryl and heteroaryl, and is preferably aryl, more preferably naphthyl or phenyl, even more preferably phenyl, each group being optionally substituted with one or more Rc, more preferably with one or more Rd, even more preferably with one or more Re, still more preferably with one or more Rf, and most preferably with one or more R9; and X is as defined in any one of above "E1", "E2", "E3" or "E4".
In another preferred embodiment ("E15"), one of R1 and R2 is hydrogen and the other of R1 and R2 is chosen from the group comprising or consisting of aryl and heteroaryl, and is preferably aryl, more preferably naphthyl or phenyl, even more preferably phenyl, each group being optionally substituted with one or more Rc, more preferably with one or more Rd, even more preferably with one or more Re, still more preferably with one or more Rf, and most preferably with one or more R9; and X is as defined in any one of above "E1", "E2", "E3", or "E4". It shall be appreciated that the dicarbonyl compound or a derivative thereof of formula (I) can include various combinations of substituents X, R1 and R2 as constructed by combining the above embodiments.
In a further preferred embodiment ("E16") of the dicarbonyl compound or derivative thereof of formula (I), X is a single bond; R1 and R2 are, each independently, chosen from the group comprising or consisting of: hydrogen and aryl, preferably phenyl or naphthyl, more preferably phenyl, each group being optionally substituted with one or more Re, preferably with one or more Rf, more preferably with one or more R9. Preferably, one, more preferably only one, or alternatively neither of R1 and R2 may be hydrogen.
In a particularly preferred embodiment ("E18"), the dicarbonyl compound or a derivative thereof is phenylglyoxal. In another embodiment ("E19"), it is hydroxyphenylglyoxal, more preferably p-hydroxyphenylglyoxal. These compounds are particularly effective for modification of guanidino groups in the present method.
In a further preferred embodiment ("E20") of the dicarbonyl compound or derivative thereof of formula (I), X is methylene substituted with one or more nitro, preferably with one nitro, more preferably X is -CH(-NO2)- or any protonated or dissociated form thereof; R1 and R2 are, each independently, as defined in any of embodiments "E5" to "E15" as above, preferably at least one of R1 and R2 is hydrogen.
In a further preferred embodiment ("E21") of the dicarbonyl compound or derivative thereof of formula (I), X is methylene substituted with one or more nitro, preferably with one nitro, more preferably X is -CH(-NO2)- or any protonated or dissociated form thereof; R1 and R2 are each hydrogen. Hence, in a particularly preferred embodiment ("E22"), the dicarbonyl compound is nitromalondialdehyde (NMA), i.e., X is -CH(-NO2)- or any protonated or dissociated form thereof, e.g., -C~(-NO2)-; and R1 and R2 are each hydrogen. These compounds are particularly effective for modification of guanidino groups in the present method.
It can be appreciated that in some dicarbonyl compounds or derivatives thereof of formula (I), the -C(=O)-X-C(=O)- moiety may form a conjugated system as set forth in formulas (II) or
Figure imgf000054_0001
/ O X U υ υ
The term "conjugated system" refers to a system where a sequence of three or more atoms exhibits delocalised bonding over said three or more atoms, especially delocalisation of electrons across adjacent parallel aligned p-orbitals. By means of example and not limitation, a delocalised system can result in said dicarbonyl compound when X is a single bond, whereby p-electrons contributing to the two C=O double bonds delocalise.
When the -C(=O)-X-C(=O)- moiety forms a conjugated system, the dicarbonyl compound or derivative thereof, e.g., as individualised in any of the above embodiments, may display stereoisomerism with cis- or trans- arrangement of substituents R1 and R2, as generally illustrated in the above formulas (III) and (II), respectively. In a preferred embodiment, the moieties R1 and R2, e.g., in the dicarbonyl compound or derivative thereof as individualised in any of the above embodiments, are in trans as set forth in formula (II) above, which may provide for less steric hindrance there between.
By means of example, and not limitation, the dicarbonyl compound or a derivative thereof may be frans-hydroxyphenylglyoxal (IV):
Figure imgf000054_0002
(IV), more preferably frans-p-hydroxyphenylglyoxal (V), i.e.:
Figure imgf000055_0001
Preferably, the dicarbonyl compound or derivative may form a stable adduct with the guanidino moiety, preferably an adduct as set forth in any of formulas (Via) or (VIb), or (Vila) or (VIIb), including any protonated or dissociated forms thereof:
Figure imgf000055_0002
wherein the -NH-C(=N-)NH- or the -N=C(NH-)NH- groups represents the modified guanidino moiety; wherein R3 and R4 are any of Ri or R2 and when R3 is Ri then R4 is R2 and when R3 is R2 then R4 is R1 ; wherein R5 and R6 are any of R1 or R2 and when R5 is R1 then R6 is R2 and when R5 is R2 then R6 is R1 ; wherein R1, R2 and X have the same meaning as above. In an embodiment, a preferred adduct may be the one set forth in formula (Via) or (VIb) above; this type of adduct may show advantages, such as, e.g., greater stability and/or a greater impact on the properties of the altered peptides due to the presence of two molecules of the dicarbonyl compound or derivative therein. In a further embodiment, a preferred adduct may be any of the adducts set forth in formulas (Vila) or (VIIb); this type of adducts or further products thereof may show advantages, such as, e.g., greater stability and/or a greater impact on the properties of the altered peptides. In a preferred embodiment, the adducts as set forth in any of formulas (Vila) or (VIIb) may undergo dehydration, whereby one or more water molecules are eliminated from the adducts. By means of example and not limitation, reaction between an arginine residue (via its guanidino moiety) with nitromalondialdehyde (NMA), a preferred dicarbonyl compound of the invention, will eventually produce a δ-(5-nitro-2-pyrimidyl)ornithyl derivative (see Figure 3).
By means of example, but without limitation, adducts as set forth in formula (Via) or (VIb) may be preferably obtained by reacting peptides of a protein peptide mixture with a dicarbonyl compound or derivative thereof as defined in any of above embodiments "E10", "E11"; also preferably "E 14", "E 15"; more preferably "E 16" to "E 19", such as, e.g., phenylglyoxal, preferably hydroxyphenylglyoxal, more preferably p-hydroxyphenylglyoxal, such as, preferably, frans-phenylglyoxal, more preferably frans-hydroxyphenylglyoxal, and even more preferably frans-p-hydroxyphenylglyoxal. By means of example, but without limitation, adducts as set forth in formulas (Vila) or (VIIb), or further dehydration products thereof as explained, may be preferably obtained by reacting peptides of a protein peptide mixture with a dicarbonyl compound or derivative thereof as defined in any of above embodiments "E4" and "E6", more preferably "E20" to "E22", most preferably with nitromalondialdehyde (NMA). One skilled in the art can in general optimise reaction conditions under which the guanidino moieties of peptides will form suitable adducts with the dicarbonyl compound or derivative thereof, such as, e.g., adducts as shown in formulas (Via) or (VIb), or (Vila) or (VIIb). Such conditions may include, without limitation, concentrations of the reactants, relative molar excess of the reactants, solvent, pH, buffer system, temperature, reaction time, presence of catalysts, stopping or quenching the reaction, etc. For example, by means of further guidance and not limitation, reaction conditions to obtain adducts between peptides and the dicarbonyl compound or derivative thereof, preferably adducts as in formulas (Via) or (VIb), or (Vila) or (VIIb) may include any or all of the following:
- molar excess of the dicarbonyl compound over peptides of at least 1.5x, preferably at least 2x, more preferably at least 5x, even more preferably at least 1Ox, still more preferably at least 2Ox, e.g., at least 3Ox or at least 4Ox, yet more preferably at least 5Ox, e.g., at least 6Ox, at least 7Ox, at least 8Ox or at least 9Ox, and most preferably at least 10Ox, e.g., at least 20Ox, or even at least 50Ox or more;
- aqueous solvent, preferably water;
- pH between 7 and 11 , preferably between 7 and 10, e.g., about 7, about 8, about 9 or about 10, more preferably between 8 and 10, and even more preferably about 9, e.g., 9; temperature between 5°C and 800C, preferably between 100C and 600C, more preferably between 15°C and 40°C, even more preferably between 20°C and 300C, yet more preferably between 20°C and 25°C or between 25°C and 300C, such as, e.g., ambient temperature or about 30°C; - reaction in darkness;
- etc.
In a further preferred embodiment, peptides of the protein peptide mixture (PPM) that comprise a guanidino moiety, more preferably such peptides comprising an Arg residue, may be specifically modified by reacting with peptidylarginine deiminase (EC 3.5.3.15; see, e.g., Fujisaki & Sugawara 1981. J Biochem (Tokyo) 89: 257-263). Peptidylarginine deiminases catalyse enzymatic deimination of the guanidino group, most preferably of Arg residues, to an ureyl group (Arg → citrulline).
In a further embodiment, peptides of the protein peptide mixture (PPM) that comprise a guanidino moiety, more preferably such peptides comprising an Arg residue, may be specifically modified by reacting with arginase (EC 3.5.3.1 ; see, e.g., Bach & Killip 1961. Biochim Biophys Acta 47: 336-343; Greenberg 1960, Arginase, in Boyer et a/., Eds., The Enzymes, 2nd edn., vol. 4, Academic Press, New York, pp. 257-267). Arginases catalyse enzymatic hydrolysis of the guanidino group, most preferably of Arg residues, to an amino group (Arg → ornithine). In another preferred embodiment, the protein (P) or the mixture of proteins (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to basic amino acid residue(s), more preferably C-terminally adjacent to Arg and/or hArg and/or Lys, even more preferably C-terminally adjacent to Arg and/or Lys.
Hence, in related embodiments, the protein (P) or protein mixture (PM) can be fragmented preferentially at peptide bonds C-terminally adjacent to: - Arg residues (i.e., n=1 , X1=Arg); or
- Arg and/or Lys residues, preferably Arg and Lys residues (i.e., n=2, X1=Arg, X2=Lys); or Arg and/or hArg residues, preferably Arg and hArg (i.e., n=2, X1=Arg, X2=hArg); or
- Arg and/or hArg and/or Lys residues, preferably Arg and hArg and Lys residues (i.e., n=3, X1=Arg, X2=hArg, X3=Lys). Such fragmentation may be advantageously achieved using trypsin type endopeptidases, preferably trypsin. If cleavage after Lys is not desired, as in some of the above embodiments, Lys residues may be suitably blocked before said proteolysis, as described elsewhere in this specification.
In the so obtained protein peptide mixture (PPM), peptides comprising a basic amino acid (preferably Arg, hArg or Lys, more preferably Arg or Lys) as their last residue will thus mainly be derived from the N-termini or internal portions of the starting protein(s), whereas peptides not having such basic amino acid as their last residue will mostly originate from and comprise the C-terminal ends of the starting protein(s). To allow to discriminate these peptide subsets, the invention contemplates particularly advantageous manners to specifically alter peptides that include such basic amino acids as their last residues.
In a preferred embodiment, peptides of the protein peptide mixture (PPM) that comprise a basic amino acid, preferably Arg, Lys or hArg, even more preferably Arg or Lys, as their last residue may be modified by reacting with carboxypeptidase B, carboxypeptidase U or carboxypeptidase D, more preferably with carboxypeptidase B, which catalyse the specific removal of said last basic residue.
In another embodiment, peptides of the protein peptide mixture (PPM) that comprise a basic amino acid, preferably Lys, as their last residue may be modified by reacting with carboxypeptidase N, which catalyses the specific removal of said last basic residue.
The above disclosed methods to specifically modify or remove the one or more amino acid residues X1, X2... Xπ in or from peptides of the protein peptide mixture (PPM) will change one or more properties of the so altered peptides, and thereby allow to distinguish between the altered and unaltered peptides and to specifically isolate or enrich for the subset (S) of the unaltered peptides. For example, said modification or removal of the one or more amino acid residues X1, X2... or Xn may change one or more chemical or physical characteristics of the altered peptides, or may change their interaction with a capture agent. Without limitation, said modification or residue removal may change one or more following peptide characteristics:
Net charge. For example, a modified amino acid residue may be a stronger or weaker acid or base than the original residue, causing a difference in protonation and charge at particular pH values. Alternatively or in addition, a modifying agent may comprise one or more acidic (such as, e.g., -COOH, -SO2H or -SO3H, etc.) or basic (such as, e.g., -NH2, basic N-containing heteroaryl or heterocyclyl) groups which can confer a charge onto the modified amino acid. Also, removal of an acidic or basic amino acid residue can change the net charge of so altered peptides;
Hydrophobicity, hydrophilicity. For example, depending on the presence and positions of polar (such as, e.g., -OH, -F, -Cl, =0, etc.) and less- or non-polar (such as, e.g., alkyl, aryl, aryl alkyl, etc.) groups, the polarity of a modified amino acid residue may differ from that of the original residue, thereby changing, e.g., increasing or decreasing the hydrophobicity or hydrophilicity of the altered peptides;
Binding to capture agents / ligands. For example, the modifying agent may introduce into the altered peptides a group with a strong affinity for binding with a capture agent / ligand, e.g., a biotin-streptavidin binding or hapten-antibody binding or metal ion complexation, whereby said affinity is conferred to altered peptides;
- Size. For example, the modifying agent may introduce into the altered peptides a bulky, voluminous entity which can confer an increase in molecular size to the altered peptides;
- etc. Preferably, the modification or removal of said one or more residue types X1, X2,... Xπ, may change, e.g., increase or decrease, hydrophobicity of the so altered peptides.
For example, in embodiments where peptides are modified at the guanidino group, reaction with the dicarbonyl compound or derivative thereof of formula (I) as defined herein may change, e.g., increase or decrease, hydrophobicity of the so altered peptides. For example, a modification which changes hydrophobicity / hydrophilicity of altered peptides comprising a guanidino group may result from reacting said peptides with a dicarbonyl compound or derivative thereof as defined in any of above embodiments "E 10", "E11 "; also preferably "E14", "E15"; more preferably "E16" to "E19", such as, e.g., phenylglyoxal, preferably hydroxyphenylglyoxal, more preferably p-hydroxyphenylglyoxal, such as, preferably, trans- phenylglyoxal, more preferably frans-hydroxyphenylglyoxal, and even more preferably trans- p-hydroxyphenylglyoxal; or as defined in any of above embodiments "E4" and "E16", more preferably "E20" to "E22", most preferably with nitromalondialdehyde (NMA).
In another example, in embodiments where peptides are modified by removal of a basic last residue, said removal may change, in particular decrease, the net charge of said peptide.
Advantageously, such a change in the properties of the altered peptides can affect the chromatographic behaviour of the altered peptides, thus allowing the use of chromatography to distinguish between the altered and unaltered peptides and specifically to isolate or enrich for the subset of unaltered peptides, i.e., peptides comprising C-terminal ends from the protein (P) or protein mixture (PM).
Advantageously, the changed properties of the altered peptides can affect the differential distribution of said altered peptides between the mobile and stationary phase in chromatography, as compared to the unaltered peptides. This affects the migration, in particular the speed of migration, of the altered peptides along the stationary phase in chromatography, such that the altered peptides migrate differently, e.g., faster or slower, than the unaltered peptides. Preferably the altered peptides have a different, e.g., shorter or longer, elution time than the unaltered peptides. The type of chromatography suitable for separating the altered and unaltered peptides and isolating the subset (S) of unaltered peptides can be advantageously chosen based on the nature of the change in the properties of the altered peptides resulting from their modification using the agents of the invention. While such choice can be made by a skilled person armed with the present teachings, following examples are provided by further guidance and not limitation.
For example, where the amino acid modification or removal changes the net charge of the altered peptides compared to the unaltered peptides, ion exchange chromatography can be advantageously used. Alternatively, where the amino acid modification or removal changes the hydrophobicity or hydrophilicity of the altered peptides compared to the unaltered peptides, reverse phase chromatography, e.g., RP-HPLC, or hydrophobic interaction chromatography may be advantageously used. Yet otherwise, where the amino acid modification or removal changes the affinity of the altered peptides for a particular capture agent / ligand, e.g., an antibody or metal ion, affinity chromatography, e.g., immuno-affinity chromatography or immobilized metal affinity chromatography, may be suitably used.
In a preferred embodiment, said amino acid modification or removal changes the hydrophobicity, e.g., increases or decreases the hydrophobicity, of the altered peptides compared to the unaltered peptides, and reverse phase chromatography, preferably RP- HPLC is used for separating and isolating the subset (S) of unaltered peptides.
In another preferred embodiment, said amino acid modification or removal changes the hydrophilicity, e.g., increases or decreases the hydrophilicity, of the altered peptides compared to the unaltered peptides, and reverse phase chromatography, preferably RP- HPLC is used for separating and isolating the subset (S) of unaltered peptides.
For example, various modes and protocols for performing RP chromatography, in particular RP-HPLC, are described in WO 02/077016, incorporated by reference herein.
As can be appreciated by a skilled person, while chromatography is preferred herein, other separation methods, insofar applicable, may be used to isolate the subset of unaltered peptides, such as, e.g., electrophoresis, including capillary electrophoresis, free flow electrophoresis, capillary zone electrophoresis, capillary electro-chromatography, capillary isoelectric focusing and affinity electrophoresis, as known in the art.
In a preferred embodiment, as detailed in the Summary section, the step (b) of present method may comprise steps (ba) to (be): (ba) separating the protein peptide mixture (PPM) into fractions of peptides via chromatography, (bb) reacting at least one and preferably each peptide fraction from step (ba) with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xπ, thereby obtaining altered and unaltered peptides for each so reacted fraction, and (be) isolating the subset of unaltered peptides out of each so reacted fraction via chromatography, wherein the chromatography of steps (ba) and (be) is performed with the same type of chromatography.
The chromatographic separation in step (ba), i.e., prior to reacting the peptide fractions with agents of the invention, is referred to herein as the "primary run" or the "primary chromatographic step" or the "primary chromatographic separation" or "run 1". The chromatographic separation in step (be), i.e., subsequent to said reacting, is referred to herein as the "secondary run" or the "secondary chromatographic step" or the "secondary chromatographic separation" or "run 2".
The "same type of chromatography" means that the first and second chromatographic separations are of the same type, in particular they are both configured to separate the peptides on the basis of the same property, e.g., a physical and/or chemical property of the peptides. For example, but without limitation, the primary and secondary runs can both separate the peptides on the basis of their hydrophobicity (e.g., the primary and secondary runs can both be hydrophobicity chromatography, preferably both be RP-chromatography, more preferably both be RP-HPLC chromatography); or the primary and secondary runs can both separate the peptides on the basis of their net charge (e.g., the primary and secondary runs can both be can both be ion exchange chromatography, preferably both be cation exchange chromatography or preferably both be anion exchange chromatography), or the primary and secondary runs can both separate the peptides on the basis of their bulk size (e.g., the primary and secondary runs can both be size exclusion chromatography), etc. Accordingly, when the agents of the invention change peptides such as to change the property or properties distinguished by said chromatography, the altered peptides will show a migration shift in the secondary run vis-a-vis the primary run, which allows for the separation of the altered and unaltered peptides and for isolation of the latter from the peptide fraction.
Separating the protein peptide mixture into fractions in the primary run and analysing each fraction separately (or, alternatively, analysing suitably pooled fractions, see below) in the secondary run advantageously ensures that the altered peptides from a particular fraction do not co-migrate with, and therefore can be distinguished from, unaltered peptides of one or more other fractions. Preferably, the method contemplates that each fraction from the primary run, having been subjected to reaction with the agents of the invention, can be separated in the secondary run into a fraction containing the unaltered peptides and a fraction containing the altered peptides. The respective elution time windows (herein, "elution time window" of a fraction refers to the time window, within which at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% or at least 99% of peptides of the said fraction elute) for said fraction containing the unaltered peptides and for said fraction containing the altered peptides do not overlap. This allows for unambiguous isolation of the subset of unaltered peptides. Preferably, the time distance between the respective elution time windows for the fraction containing the unaltered peptides and for the fraction containing the altered peptides equals at least 0.25x, more preferably at least 0.5x, even more preferably at least 0.75x, yet more preferably at least 1x, and still more preferably at least 1.5x, 2 or more x of the duration of the elution time window for the fraction containing the unaltered peptides. Further considerations applicable for this issue can be found in WO 02/077016 (p. 12, I. 20 through p. 14, I. 5 and p. 18, I. 5-32 and Figure 1 ), incorporated herein by reference. It shall be appreciated that said time distance between the fraction containing the unaltered peptides and that containing the altered peptides will depend on a number of factors, such as, e.g., the agent used for peptide modification; the width of the fraction from the primary run subjected to the secondary run, the chromatographic conditions (e.g., type of stationary and mobile phase, buffers, etc.), and the like, which can be generally optimised by a skilled person armed with the present teachings.
In a preferred embodiment, the chromatographic conditions of the primary run and the secondary run are identical or, for a person skilled in the art, substantially similar. Substantially similar means for instance that small changes in flow and/or gradient and/or temperature and/or pressure and/or chromatographic beads and/or solvent composition, or the like, are tolerated between run 1 and run 2 as long as the chromatographic conditions lead to an elution of the altered peptides that is predictably distinct from the unaltered peptides and this for every fraction collected from run 1.
Preferably, in order to ensure a comprehensive analysis of peptides in the protein peptide mixture, the reaction with the agents of the invention should be preferably effective in each of the so analysed peptide fractions from the primary run, such that in each fraction obtained from said primary run, the altered peptides will migrate distinctly from the unaltered peptides in the secondary chromatographic step.
In an embodiment, each fraction from the primary run may be individually reacted with an agent of the invention and subsequently individually separated into altered and unaltered peptides in the secondary run. However, the method may be streamlined if said reacting and/or the secondary chromatographic run is performed with pools of two or more fractions from the primary run. In particular, when fractions from the primary run are pooled, each of said fractions yields, in the secondary run, fractions of altered peptides and unaltered peptides, the respective elution time windows of which should not overlap (see above).
Moreover, the elution time window of any fraction obtained in the secondary run from a given primary run-fraction should not overlap with the elution time window of any fraction obtained in the secondary run from any other primary run-fraction in the pool.
In a further preferred embodiment, buffers and or solvents used in both chromatographic steps are compatible with the conditions required to allow an efficient alteration of peptides with an agent of the invention. In a particular preferred embodiment the nature of the solvents and buffer in the primary run, the secondary run and the alteration step are identical or substantially similar. In a further preferred embodiment said buffers and solvents are compatible with the conditions required to perform a mass spectrometric analysis.
In cases where the reaction with dicarbonyl compound or derivative may require specific reaction conditions which are not compatible with the buffers used in the primary and/or secondary run, such conditions can be suitably changed before the alteration step and/or after the alteration step, the change being performed by methods described in the art such as for example an extraction, a lyophilisation and redisolving step, a precipitation and redisolving step, a dialysis against an appropriate buffer/solvent or even a fast reverse phase separation with a steep gradient, etc.
Analogously, application of a pre-treatment step as mentioned herein above may require such changing in buffers or conditions before the first run. Such changing in buffers or conditions may also be required before analysis of the peptides after the secondary run, etc.
It shall be appreciated that the invention is also directed to a peptide sorter device that is able to carry out the methods of the invention, in particular the methods as above comprising primary and secondary chromatographic runs of the same type wherein the peptide fractions from run 1 are modified with an agent of the invention before separation in run 2. The term
"peptide sorter" refers to a device that efficiently separates unaltered peptides from the altered peptides. In a preferred embodiment, identical or very similar chromatographic conditions are used in the peptide sorter in the two chromatographic runs such that during the secondary run the unaltered peptides stay at their original elution times and the altered peptides undergo a shift in the elution time. As described herein, a peptide sorter particularly refers to the pooling of fractions obtained after run 1 and the optimal organisation of the second chromatographic step to speed up the isolation of the unaltered peptides out of each of the run 1 fractions. Accordingly, in a preferred embodiment, the invention relates to a system for sorting peptides comprising: a primary chromatographic column for separating a protein peptide (PPM) mixture into a plurality of fractions under a defined set of conditions, whereby each fraction is subsequently reacted with an agent of the invention as defined herein, and wherein the so- reacted fractions are pooled into a set of pooled fractions, each pooled fraction comprising at least two so-reacted fractions; and a set of secondary chromatographic columns comprising a first secondary chromatographic column for separating a first pooled fraction and at least a second secondary chromatographic column arranged in parallel with the first secondary chromatographic column for separating a second pooled fraction; wherein the set of secondary chromatography columns perform isolation of the unaltered peptides under substantially identical conditions as the defined set of conditions, whereby there is no elution overlap between i) the unaltered peptides from different fractions within one pool or between pools and ii) the unaltered peptides and the altered peptides.
Preferably, the said defined set of conditions is configured to maximise the chromatography separation between peptides modified with the respective agent of the invention, from the unaltered peptides. Preferably, said set of conditions may be optimised for situations wherein the agent is a dicarbonyl compound or derivative thereof of formula (I) as defined herein which modifies a guanidino group, preferably of Arg and/or hArg, more preferably forms an adduct of any of formulas (Via) or (VIb), or (Vila) or (VIIb); or wherein the agent is peptidylarginine deiminase or arginase, which modify a guanidino group, preferably Arg; or wherein the agent removes basic last residue, such as carboxypeptidase B, U, D or N, preferably carboxypeptidase B. Preferably, the defined set of conditions involves hydrophobicity chromatography, preferably RP chromatography, more preferably RP-HPLC.
Accordingly, the invention also provides for methods as described above performed in conjunction with the peptide sorter devices as described herein.
While the present invention contemplates peptide sorter devices which employ conditions that are specifically configured to achieve maximum separation of the unaltered peptides from peptides altered as taught herein, and uses thereof, further features and operation of said peptide sorters may be essentially as disclosed in WO 02/077016 (especially on p. 38, I. 15 through p. 54, I. 5, and p. 80, I. 23 through p. 88, I. 16 incorporated herein by reference).
As noted, in a preferred embodiment, the invention provides a method for isolating or enriching, from a protein peptide mixture (PPM) obtained from a protein (P) or a mixture of proteins (PM), the peptides that comprise the C-terminal ends of said protein (P) or mixture of proteins (PM), comprising the steps of:
(i) fragmenting the protein (P) or the mixture of proteins (PM) preferentially at peptide bonds C-terminally adjacent to one or more amino acid residue types (X1, X2,... Xπ) to obtain the protein peptide mixture (PPM), wherein said one or more amino acid residue types X1, X2,... Xπ are basic; (ii) isolating, under conditions where the majority of the C-terminal -C(=O)OH groups of the peptides of the protein peptide mixture (PPM) are dissociated, i.e., -C(=O)O", the majority of the N-terminal -NH2 groups of said peptides are protonated, i.e., -NH3 +, and the basic side chain moiety of the majority of basic amino acids adjacent to which the protein (P) or the mixture of proteins (PM) were proteolysed are protonated, a subset (S') of peptides from the protein peptide mixture (PPM), wherein the peptides of said subset (S') have about zero net charge under said conditions; and (iii) isolating a subset (S) of peptides from said subset of peptides (S'), comprising the steps of: reacting the subset (S') with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xπ, and isolating from the so reacted protein peptide mixture the subset (S) of peptides unaltered by said reacting.
In a preferred embodiment, in above step (ii) the conditions are such that the acidic side chain moiety of the majority of acidic amino acids of the peptides of above step (i), and particularly the -COOH moiety of aspartic acid and glutamic acid residues, is not dissociated. This can reduce the confounding effect of such acidic moieties on the method.
In a preferred embodiment, the conditions in above step (ii) encompass pH between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.5, e.g., between 2.75 and 3.25, and still more preferably about 3, such as, e.g., 2.80, 2.85, 2.90, 2.95, 3.0, 3.05, 3.10, 3.15 or 3.20.
In an embodiment, the fragmentation, e.g., proteolysis, in step (i) occurs preferentially at peptide bonds C-terminally adjacent to all types of basic amino acid residues.
In an embodiment, the fragmentation in step (i) is preferentially at peptide bonds C-terminally adjacent to arginine and/or lysine and/or homoarginine and/or histidine residues. Further preferred embodiments are noted in the Summary section. Hence, in related embodiments, the protein (P) or protein mixture (PM) can be fragmented preferentially at peptide bonds C-terminally adjacent to: Arg residues (i.e., n=1 , X1=Arg); or
- Arg and/or Lys residues, preferably Arg and Lys residues (i.e., n=2, X1=Arg, X2=Lys); or - Arg and/or hArg residues, preferably Arg and hArg (i.e., n=2, X1=Arg, X2=hArg); or
- Arg and/or hArg and/or Lys residues, preferably Arg and hArg and Lys residues (i.e., n=3, X1=Arg, X2=hArg, X3=Lys).
Such fragmentation may be advantageously achieved using trypsin type endopeptidases, preferably trypsin, as taught elsewhere in this specification. As noted, trypsin does not cleave after histidine. Therefore, a minor fraction of C-terminal peptides which contain a histidine residue may have an above zero net charge under the conditions of step (ii) and may not be isolated within the subset (S') of C- terminal peptides (this however does not diminish the usefulness of the method).
Where the proteolysis does not take place C-terminally adjacent to each type of basic amino acid, in particular C-terminally adjacent to each of arginine, lysine, histidine and homoarginine, the charge of at least some or all of the basic amino acid types after which the proteolysis does not occur may be preferably neutralised by suitable modification, such as, e.g., by acetylation, such as, e.g., of lysines, or modification of His by diethylpyrocarbonate.
In an embodiment, said subset (S') of peptides can be isolated using a technique capable of separating peptides on the basis of net charge, including, but not limited to, ion exchange chromatography, isoelectric focusing of peptides, and zwitterionic ion exchange chromatography (see, e.g., WO 00/27496).
In a preferred embodiment, said subset (S') of peptides can be isolated using ion exchange chromatography. In a preferred embodiment, said subset (S') of peptides can be isolated using cation exchange chromatography. Herein, positively charged peptides will be more strongly retained by the stationary phase, whereas substantially uncharged peptides. In particular, the C- terminal and optionally blocked N-terminal peptides of interest, will elute faster and can be recovered from said eluate. In a preferred embodiment, said subset (S') of peptides can be isolated using strong cation exchange (SCX) chromatography. The methods and devices of the preceding aspects find particular use in proteomics applications, and particular in gel-free proteomic applications. In preferred embodiments, the methods allow for the specific isolation of peptides comprising the C-terminal ends of proteins, e.g., proteins of complex protein mixtures such as biological samples. Said peptides comprising C-terminal ends of proteins are highly representative of the originating proteins and as such serve as identification elements for their corresponding proteins.
The present invention therefore further provides a method to identify a subset of peptides isolated from a protein peptide mixture, in particular isolated peptides comprising C-terminal ends of proteins, and their corresponding proteins in a sample comprising proteins. Thereto the isolation of peptides, in particular peptides comprising C-terminal ends of proteins, according to any of the embodiments of the invention is further coupled to analysis of so isolated peptides.
In a preferred approach peptide analysis of the isolated peptides is performed with a mass spectrometer. However, said isolated peptides can also be further analysed and identified using other methods such as electrophoresis, activity measurement in assays, analysis with specific antibodies, Edman sequencing, etc.
An analysis or identification step can be carried out in different ways. In one way, the isolated peptides, e.g., eluting from a chromatographic column, are directly directed to the analyzer. In an alternative approach, isolated peptides are collected in fractions and fractions may or may not be manipulated before going into further analysis or identification. An example of such manipulation consists of a concentration step, followed by spotting each concentrate on for instance, a MALDI-target for further analysis and identification.
In a preferred embodiment the isolated peptides are analysed with high-throughput mass spectrometric techniques. The information obtained is the mass of the isolated peptides. When the peptide mass is very accurately defined, such as with a Fourrier transform mass spectrometer (FTMS), using an internal calibration procedure (e.g., O'Connor and Costello 2000, Anal Chem 72: 5881-5885), it is possible to correlate unambiguously the peptide mass with the mass of a corresponding peptide in peptide mass databases and as such identify the isolated peptide. In further developments, to improve unambiguous identification of the peptide, data about the mass of the peptide can be complemented with other information, such as, e.g., described in WO 02/077016 (p. 22, I. 28 through p. 29, I. 8, incorporated herein by reference). A yet further piece of information that can be used to identify isolated peptides is the Grand Average of hydrophaticity (GRAVY) of the peptides, reflected in the elution times during chromatography. Two or more peptides, with identical masses or with masses that fall within the error range of the mass measurements, can be distinguished by comparing their experimentally determined GRAVY with the in silico predicted GRAVY. Another piece of information to identify isolated peptides may be the normalised elution time (NET), see, e.g., Norbeck et al. 2005 (J Am Soc Mass Spectrom 16: 1239-49).
Any mass spectrometer may be used to analyze the isolated peptides. Non-limiting examples of mass spectrometers include the matrix-assisted laser desorption/ionization ("MALDI") time- of-flight ("TOF") mass spectrometer MS or MALDI-TOF-MS, available from PerSeptive Biosystems, Framingham, Massachusetts; the Ettan MALDI-TOF from AP Biotech and the Reflex III from Brucker-Daltonias, Bremen, Germany for use in post-source decay analysis; the Electrospray Ionization (ESI) ion trap mass spectrometer, available from Finnigan MAT, San Jose, California; the ESI quadrupole mass spectrometer, available from Finnigan MAT or the GSTAR Pulsar Hybrid LC/MS/MS system of Applied Biosystems Group, Foster City, California and a Fourrier transform mass spectrometer (FTMS) using an internal calibration procedure (O'Connor and Costello, 2000).
Protein identification software used in the present invention to compare the experimental mass spectra of the peptides with a database of the peptide masses and the corresponding proteins are available in the art. One such algorithm, ProFound, uses a Bayesian algorithm to search protein or DNA database to identify the optimum match between the experimental data and the protein in the database. ProFound may be accessed on the World-Wide Web at http//prowl. rockefeller.edu and http//www.proteometrics.com. Profound accesses the non- redundant database (NR). Peptide Search can be accessed at the EMBL website. See also, Chaurand P. et al. (1999) J. Am. Soc. Mass. Spectrom 10, 91 , Patterson S. D., (2000), Am. Physiol. Soc, 59-65, Yates JR (1998) Electrophoresis, 19, 893). MS/MS spectra may also be analysed by MASCOT (available at http://www.matrixscience.com, Matrix Science Ltd. London).
In another preferred embodiment isolated peptides are individually subjected to fragmentation in the mass spectrometer. In this way information about the mass of the peptide is further complemented with (partial) sequence data about the peptide. Comparing this combined information with information in peptide mass and peptide and protein sequence databases allows identification of the peptides. In one approach fragmentation of the peptides is most conveniently done by collision induced dissociation (CID) and is generally referred to as MS2 or tandem mass spectrometry. Alternatively, peptide ions can decay during their flight after being volatilized and ionized in a MALDI-TOF-MS. This process is called post-source-decay (PSD). In one such mass spectrometric approach, selected peptides are transferred directly or indirectly into the ion source of an electrospray mass spectrometer and then further fragmented in the MS/MS mode. Thus, in one aspect, partial sequence information of the peptides is collected from the MSn fragmentation spectra (where it is understood that n is larger or equal to 2) and used for peptide identification in sequence databases described herein.
The present invention further provides a method for the identification of one or more proteins in a sample comprising proteins. On the one hand it is known that cleavage of a sample comprising proteins results in a protein peptide mixture comprising thousands of peptides and this overwhelms the resolving power of the currently available chromatographic systems and mass spectrometry systems. On the other hand it is known that a protein can be identified based on the identification of one or more of its constituting peptides. The current invention provides methods to isolate specific subsets of peptides, and preferably peptides comprising C-terminal ends of the proteins in the sample. This simplification of the original peptide mixture significantly reduces the co-elution of peptides in the secondary run and results in an efficient identification of the isolated peptides with analysers such as mass spectrometers or others. Since the isolated peptides, preferably C-terminal peptides of proteins, are most often unique identification elements for their corresponding parent proteins, identification of said peptides allows the identification of the proteins in the original sample comprising proteins. So, the task of identifying proteins in a sample comprising proteins by isolating and identifying one or more of their composite peptides becomes possible with the methods of the present invention.
The present invention therefore further provides a method to identify proteins in a sample comprising proteins, comprising isolating a subset peptides comprising C-terminal ends of said proteins using the methods of the invention, and identifying the isolated peptides of said subset and their corresponding proteins. In a preferred embodiment, where the peptide isolation involves the primary and secondary chromatographic runs with in between the modification with the agent of the invention, the isolated unaltered peptides can be separately identified in each of the unaltered peptide fractions obtained in the secondary runs.
It is clear for a person skilled in the art that these embodiments of the invention are equally applicable when there is a pre-treatment of the proteins or the peptides prior to their isolation, as also described here above. It is equally obvious for a person skilled in the art that, starting from the known identity of an isolated peptide, the identity of the corresponding protein can be easily determined by screening peptide, protein and DNA sequence databases. Both the databases and the software to screen are available in the art.
It is further important to mention that the invention allows the identification of a whole range of proteins in a sample comprising proteins, varying for instance from high to low abundant, from acidic to basic, from small to large, from soluble to membrane proteins. Furthermore, the invention provides a method to identify proteins in a sample comprising proteins, starting from very small amounts of cells, e.g., perhaps as few as 50,000 human cells, as well as, obviously, from larger numbers of cells. In another embodiment, the present invention provides a method to determine the relative amount of one or more proteins in two or more samples comprising proteins. The method comprises the use of differentially isotopically labelled isolated peptides, preferably peptides comprising C-terminal ends of proteins. In this method, the two samples are treated in such a way that the peptides isolated from one sample contain one isotope and the peptides isolated from a second sample contain another isotope of the same element.
Hence, the method comprises the steps of (a) labelling the peptides present in a first sample with a first isotope; (b) labelling the peptides present in a second sample with a second isotope; (c) combining the protein peptide mixture of the first sample with the protein peptide mixture of the second sample; (d) isolating a subset peptides comprising C-terminal ends of the proteins using the methods of the invention, (e) performing mass spectrometric analysis of the isolated peptides; (f) calculating the relative amounts of the isolated peptides in each sample by comparing the peak heights of the identical but differential isotopically labelled isolated peptides; and (g) determining the identity of the isolated peptide and its corresponding protein. In a preferred embodiment, where the peptide isolation involves the primary and secondary chromatographic runs with in between the modification with the agent of the invention, the isolated unaltered peptides can be separately analysed in each of the unaltered peptide fractions obtained in the secondary runs. It is obvious that the same approach can be followed in combination with a pre-treatment step as mentioned here above. It is also obvious that, instead of mixing the peptides from both samples in step (c), peptides from a first and a second sample can be separately subjected to step (d) and become combined in step any of sub-steps of step (d) or in step (e). The differential isotopic labelling of the peptides in a first and a second sample can be done in many different ways available in the art. A key element is that a particular peptide originating from the same protein in a first and a second sample is identical, except for the presence of a different isotope in one or more amino acids of the peptide. In a typical embodiment the isotope in a first sample will be the natural isotope, referring to the isotope that is predominantly present in nature, and the isotope in a second sample will be a less common isotope, hereinafter referred to as an uncommon isotope. Examples of pairs of natural and uncommon isotopes are H and D, 12C and 13C, 14N and 15N. Peptides labelled with the heaviest isotope of an isotopic pair are herein also referred to as heavy peptides. Peptides labelled with the lightest isotope of an isotope pair are herein also referred to as light peptides. For instance, a peptide labelled with H is called the light peptide, while the same peptide labelled with D is called the heavy peptide. Peptides labelled with a natural isotope and its counterparts labelled with an uncommon isotope are chemically very similar, separate chromatographically in the same manner and also ionize in the same way. However, when the peptides are fed into an analyser, such as a mass spectrometer, they will segregate into the light and the heavy peptide. The heavy peptide has a slightly higher mass due to the higher weight of the incorporated, chosen isotopic label. Because of the minor difference between the masses of the differentially isotopically labelled peptides the results of the mass spectrometric analysis of isolated peptides will be a plurality of pairs of closely spaced twin peaks, each twin peak representing a heavy and a light peptide. Each of the heavy peptides is originating from the sample labelled with the heavy isotope; each of the light peptides is originating from the sample labelled with the light isotope. The ratios (relative abundance) of the peak intensities of the heavy and the light peak in each pair are then measured. These ratios give a measure of the relative amount (differential occurrence) of that peptide (and its corresponding protein) in each sample. The peak intensities can be calculated in a conventional manner (e.g. by calculating the peak height or peak surface). As described hereinabove, the isolated peptides can also be identified allowing the identification of proteins in the samples. If a protein is present in one sample but not in another, the isolated peptide (corresponding with this protein) will be detected as one peak which can either contain the heavy or light isotope. However, in some cases it can be difficult to determine which sample generated the single peak observed during mass spectrometric analysis of the combined sample. This problem can be solved by double labelling the first sample, either before or after the proteolytic cleavage, with two different isotopes or with two different numbers of heavy isotopes. Examples of labelling agents are acylating agents.
Incorporation of the natural and/or uncommon isotope in peptides can be obtained in multiple ways. In one approach proteins are labelled in the cells. Cells for a first sample are for instance grown in media supplemented with an amino acid containing the natural isotope and cells for a second sample are grown in media supplemented with an amino acid containing the uncommon isotope. This method is well known in the art, e.g., SILAC (Stable isotope labelling with amino acids in cell culture), e.g., as in Ong et al. 2002 (MoI Cell Proteomics 1 (5): 376-86 and further developments thereof.
Mixing of the proteins/peptides from both samples can be done at different time points. The mixing can be done at the level of the sample (e.g. mixing an equal number of cells from both samples) or proteins can be isolated separately from sample 1 and sample 2 and subsequently mixed or proteins from sample 1 are digested into peptides and proteins from sample 2 are digested into peptides and the peptides originating from sample 1 and sample 2 are mixed, etc.
Incorporation of the differential isotopes can further be obtained with multiple labelling procedures based on known chemical reactions that can be carried out at the protein or the peptide level. For example, proteins can be changed by the guadinylation reaction with O- methylisourea, converting NH2-groups into guanidinium groups, thus generating homoarginine at each previous lysine position. Proteins from a first sample can be reacted with a reagent with the natural isotopes and proteins from a second sample can be reacted with a reagent with an uncommon isotope. Peptides could also be changed by Shiffs-base formation with deuterated acetaldehyde followed by reduction with normal or deuterated sodiumborohydride. This reaction, which is known to proceed in mild conditions, may lead to the incorporation of a predictable number of deuterium atoms. Peptides will be changed either at the α-NH2-group, or ε-NH2 groups of lysines or on both. Similar changes may be carried out with deuterated formaldehyde followed by reduction with NaBD4, which will generate a trideutero-methylated form of the amino groups. The reaction with formaldehyde could be carried out either on the total protein, incorporating deuterium only at lysine side chains or on the peptide mixture, where both the α-NH2 and lysine-derived NH2-groups will be labeled. Since arginine is not reacting, this also provides a method to distinguish between Arg- and Lys- containing peptides.
In a further preferred embodiment, the samples may be differentially labelled using the iTRAQ technology with isobaric reagents that tag amine groups, essentially as taught in Ross et al. 2004 (MoI Cell Proteomics 3(12): 1154-69). These tags are preferably using in conjunction with tandem MS mode (in which peptides are isolated and fragmented) in which each tag generates a unique reporter ion.
Primary amino groups are easily acylated with, for example, acetyl N-hydroxysuccinimide (ANHS). Thus, one sample can be acetylated with normal ANHS whereas a second sample can be acylated with CD3CO-NHS. Also the ε-NH2 group of all lysines is in this way derivatized in addition to the amino-terminus of the peptide. Still other labelling methods are for example acetic anhydride which can be used to acetylate hydroxyl groups and trimethylchlorosilane which can be used for less specific labelling of functional groups including hydroxyl groups and amines.
In yet another approach the primary amino acids are labelled with chemical groups allowing differentiation between the heavy and the light peptides by 5 amu, by 6 amu, by 7 amu, by 8 amu or even by larger mass difference. Alternatively, the differential isotopic labelling is carried out at the carboxy-terminal end of the peptides, allowing the differentiation between the heavy and light variants by more than 5 amu, 6 amu, 7 amu, 8 amu or even larger mass differences. Since the methods of the present invention do not require any prior knowledge of the type of proteins that may be present in the samples, they can be used to determine the relative amounts of both known and unknown proteins which are present in the samples examined. The methods provided in the present invention to determine relative amounts of at least one protein in at least two samples can be broadly applied to compare protein levels in for instance cells, tissues, or biological fluids (e.g. nipple aspiration fluid, saliva, sperm, cerebrospinal fluid, urine, serum, plasma, synovial fluid), organs, and/or complete organisms. Such a comparison includes evaluating subcellular fractions, cells, tissues, fluids, organs, and/or complete organisms which are, for example, diseased and non-diseased, stressed and non-stressed, drug-treated and non drug-treated, benign and malignant, adherent and nonadherent, infected and uninfected, transformed and untransformed. The method also allows the comparison of protein levels in subcellular fractions, cells, tissues, fluids, organisms, complete organisms exposed to different stimuli or in different stages of development or in conditions where one or more genes are silenced or over-expressed or in conditions where one or more genes have been knocked-out. In another embodiment, the methods described herein can also be employed in diagnostic assays for the detection of the presence, the absence or a variation in expression level of one or more protein markers or a specific set of proteins indicative of a disease state (e.g., such as cancer, neurodegenerative disease, inflammation, cardiovascular diseases, viral infections, bacterial infections, fungal infections or any other disease). Specific applications include the identification of target proteins which are present in metastatic and invasive cancers, the differential expression of proteins in transgenic mice, the identification of proteins that are up- or down-regulated in diseased tissues, the identification of intracellular changes in cells with physiological changes such as metabolic shift, the identification of biomarkers in cancers, the identification of signalling pathways. The present invention further provides a method to quantitate the amount of one or more proteins in a single sample comprising proteins. The method comprises the steps of: (a) preparing a protein peptide mixture; (b) adding to the mixture a known amount of a synthetic reference peptide labelled with an isotope distinguishable form the reference peptide isotope; (c) isolating a subset peptides comprising C-terminal ends of the proteins using the methods of the invention; (d) performing mass spectrometric analysis of the isolated peptides; and (e) determining the amount of the protein present in the sample by comparing the peak heights of the synthetic reference peptide to the reference peptide. In a preferred embodiment, where the peptide isolation involves the primary and secondary chromatographic runs with in between the modification with the agent of the invention, the isolated unaltered peptides can be separately analysed in each of the unaltered peptide fractions obtained in the secondary runs. It is obvious that the same methods can be followed in combination with a pre-treatment step as mentioned herein above.
"Reference peptides" as used herein are peptides whose sequence and/or mass is sufficient to unambiguously identify its parent protein. By preference, peptide synthesis of equivalents of reference peptides is easy. For the sake of clarity, a reference peptide as used herein is the native peptide as observed in the protein it represents, while a synthetic reference peptide as used herein is a synthetic counterpart of the same peptide. Such synthetic reference peptide is conveniently produced via peptide synthesis but can also be produced recombinantly. Peptide synthesis can for instance be performed with a multiple peptide synthesizer. Recombinant production can be obtained with a multitude of vectors and hosts as widely available in the art. Reference peptides by preference ionize well in mass spectrometry. A non-limiting example of a well ionizing reference peptide is a reference peptide which contains an arginine. By preference a reference peptide is also easy to isolate as above. In the latter preferred embodiment the reference peptide is simultaneously also an isolated peptide.
A reference peptide and its synthetic reference peptide counterpart are chemically very similar, separate chromatographically in the same manner and also ionize in the same way. The reference peptide and its synthetic reference peptide counterpart are however differentially isotopically labelled. As a consequence, in a preferred embodiment whereby the reference peptide is also an isolated peptide, the reference peptide and its synthetic reference peptide counterpart are altered in a similar way and are co-isolated, e.g., in the same fraction of the primary and the secondary run and in an eventual ternary run. However, when a reference peptide and its synthetic reference peptide are fed into an analyzer, such as a mass spectrometer, they will segregate into the light and heavy peptide. The heavy peptide has a slightly higher mass due to the higher weight of the incorporated chosen heavy isotope. Because of this very small difference in mass between a reference peptide and its synthetic reference peptide, both peptides will appear as a recognizable closely spaced twin peak in a mass spectrometric analysis. The ratio between the peak heights or peak intensities can be calculated and these determine the ratio between the amount of reference peptide versus the amount of synthetic reference peptide. Since a known absolute amount of synthetic reference peptide is added to the protein peptide mixture, the amount of reference peptide can be easily calculated and the amount of the corresponding protein in the sample comprising proteins can be calculated.
There are several methods known in the art to differentially isotopically label a reference peptide and its synthetic reference peptide. In a first approach, the reference peptide carries the uncommon isotope and the synthetic counterpart carries the natural isotope. In this approach the synthetic reference peptides can be efficiently chemically synthesized with their natural isotopes in large-scale preparations. To label the reference peptide with an uncommon isotope, any of the here above mentioned methods to differentially isotopically label a peptide with an uncommon isotope can be applied (in vivo labelling, enzymatic labelling, chemical labelling, etc.). One example of in vivo labelling is to incorporate the commercially available deuterated methionine CH3-SCD2-CD2-CH-(NH2)-COOH, adding 4 amu's to the total peptide mass. Alternatively, synthetic reference peptides could also contain deuterated arginine H2NC-(NH)- NH-(CD2)3-CD-(NH2)-COOH) which would add 7 amu's to the total peptide mass. It should be clear to one of skill in the art that every amino acid of which deuterated or 15N or 13C forms exist can be considered in this protocol. Many other methods can be used. Thus, in a preferred embodiment, the quantitative analysis of at least one protein in one sample comprising proteins comprises the steps of: a) preparing a protein peptide mixture wherein the peptides carry an uncommon isotope (e.g. a heavy isotope); b) adding to the protein peptide mixture a known amount of a synthetic reference peptide carrying natural isotopes (e.g. a light isotope); (c) isolating a subset peptides comprising C-terminal ends of said proteins using the methods of the invention; (d) determination by mass spectrometry of the ratio between the peaks heights of the reference peptide versus the synthetic reference peptides and (e) calculation of the amount of protein, represented by the reference peptide, in the sample comprising proteins. In a preferred embodiment, where the peptide isolation involves the primary and secondary chromatographic runs in between the modification with the agent of the invention, the isolated unaltered peptides can be separately analysed in each of the unaltered peptide fractions obtained in the secondary runs.
Also, the above methods can equally be applied in a mode whereby a reference peptide is labelled with the natural isotope and its synthetic reference peptide counterpart is labelled with an uncommon isotope. The above methods of the present invention enable quantification of the amount of protein in a sample comprising proteins and can generally be used to quantify from one up to hundreds of proteins in the sample. In a particular embodiment, each synthetic reference peptides is added in an amount equimolar to the expected amount of its reference peptide counterpart.
The methods provided in the present invention to quantify at least one protein in a sample comprising proteins can be broadly applied to quantify proteins of different interest. For example, diagnostic assays can be developed by which the level of one or more proteins is determined in a sample by making use of the present invention. Further description of applications for methods to isolate subsets of peptides from protein peptide mixtures are discussed in WO 02/077016 (especially p. 21 , I. 22 through p. 37, I. 21 thereof, herein incorporated by reference) and skilled person will be able to extend their applicability to peptide subsets obtainable in the present invention. In a particularly interesting application, the methods of the invention, including the above described methods which allow for qualitative and/or quantitative comparisons between as well as within different protein samples (e.g., samples representing different physiological states or different tissues or exposed to different conditions, etc.), can be used for the proteomics study of protein processing ("degradomics"). For example, protein processing or degradation in vivo or in cell culture may produce protein fragments displaying novel C- terminal ends. Accordingly, the methods of the present invention, which can in general enrich for and isolate peptides comprising C-terminal ends of proteins, can be advantageously used to follow the appearance of novel C-terminal end peptides which can be identified and can be indicative of novel proteolytic processing, and/or follow the changes in absolute or relative quantity of known C-terminal end peptides, representative of known cleavage events. Such methods may advantageously complement degradomics analysis based on the study of novel N-terminal peptides.
In another interesting application, the methods of the invention, including the above described methods which allow for qualitative and/or quantitative comparisons between as well as within different protein samples, can be used for the proteomics study of proteins from one species on the background of proteins from another species ("xenoproteomics"). For example, peptides comprising C-terminal ends of proteins of one species may be specifically recognised and identified vis-a-vis peptides comprising C-terminal ends of proteins from another species. By means of example and not limitation, this method may be used to specifically identify human proteins in body fluids of mice xenografted with human tissues, e.g., primary human tumours, so as to find potential biomarkers.
It shall be appreciated that the above listed applications for the methods and devices of the invention serve to illustrate, but not limit, the potential advantages of the peptides isolated according to the invention, and in particular representative peptides comprising C-terminal portions of proteins of protein samples, e.g., biological samples. It shall be understood that such peptides may be employed in essentially any proteomic application of interest.
The invention is further illustrated with examples that are not to be considered limiting. The second group of aspects of the invention, described in detail here below, is concerned around a method for protein identification and optionally quantification from a protein mixture comprising the steps: (a) fragmenting a mixture of proteins (PM) to obtain a protein peptide mixture (PPM);
(b) isolating from the protein peptide mixture PPM:
(ba) peptides comprising the N-terminal ends of proteins of the mixture of proteins PM (i.e., N-terminal peptides), and/or
(bb) peptides comprising the C-terminal ends of proteins of the mixture of proteins PM (i.e., C-terminal peptides);
(c) separating the isolated N-terminal and/or C-terminal peptides into fractions of peptides via a multidimensional separation process or via one-dimensional long-column chromatography; and
(d) identifying and optionally quantifying one or more N-terminal and/or C-terminal peptides from one or more of said fractions, whereby said identified N-terminal and/or C-terminal peptides represent one or more proteins from the mixture of proteins PM.
As noted, protein mixture (PM) may be subjected to chemical and/or enzymatic pre- treatment(s) such as to desirably block or alter selected moieties before and/or following fragmentation. For example, the mixture of proteins (PM) or the protein peptide mixture (PPM) may be reacted with one or more modifying reagents, simultaneously or sequentially in any suitable order, which reagents may preferably fall into the following classes: modifiers of primary amines, particularly modifiers of α-NH2 groups and/or Lys ε-NH2 groups; or modifiers of cysteine residues. After treatment with one or more modifying reagents, the sample may optionally be purified using known techniques, such as solvent evaporation, washing, filtration, chromatographic techniques, etc.
Suitable blocking reagents, as well as methods and conditions for attaching and detaching protecting groups will be clear to the skilled person and are generally described in standard handbooks of organic chemistry, such as "Protecting Groups", P. Kocienski, Thieme Medical Publishers, 2000; Greene and Wuts, "Protective groups in organic synthesis", 3rd edition, Wiley and Sons, 1999; incorporated herein by reference in its entirety. Preferably, Cys -SH groups in the protein mixture (PM) or protein peptide mixture (PPM) are protected to avoid their reactivity, in particular oxidation, throughout the method. Typically, the protein mixture (PM) or protein peptide mixture (PPM) is first treated with a reducing agent known per se, such as, e.g., β-mercaptoethanol, dithiothreitol (DTT), dithioerythritol (DTE) or suitable trialkylphosphine inter alia tris(2-carboxyethyl)phosphine (TCEP), to quantitatively reduce any oxidised -SH groups, e.g., disulphide bridges. The -SH groups are subsequently protected with a blocking reagent that reacts selectively with Cys side chains and presents a non-reactive substituent for subsequent conditions. By means of example and not limitation, - SH groups may be converted to acetamide derivatives by treatment with iodoacetamide in denaturing buffers (e.g., guanidium ion- or urea-containing buffers). Other blocking reagents, such as N-substituted maleimides (e.g., N-ethylmaleimide), acrylamide, N-substituted acrylamide or 2-vinylpyridine, may alternatively be used.
In some embodiments, primary amino groups ("primary amino" alone or in combination refers to a group of formula -NH2, optionally in any dissociation or protonation state such as -NH3 +), such as particularly α-NH2 groups and/or side chain primary amino groups including Lys ε- NH2 groups in the protein mixture (PM) or protein peptide mixture (PPM) may need to be modified to block their reactivity and/or to neutralise or otherwise alter the charge thereof, using a suitable reagent that reacts selectively with the desired primary amino groups and presents a non-reactive substituent for subsequent conditions. The reagent may be generally substituted once or twice on each so-modified primary amine (i.e., -NH2 gives -NHZ or -NZ2, where Z is the substituent introduced by said reagent).
In a non-limiting and preferred example, primary amines may be protected by acylation, more preferably acetylation, using reagents known per se, such as, e.g., using acetyl N- hydroxysuccinimide. Acylation of primary amino groups can avoid protonation of so-modified groups under conditions of the present methods, thereby advantageously neutralising the charge of so-modified amino groups. Other suitable NH2-modifying reagents have been extensively described in the art, for example, in Regnier et al. 2006 (Proteomics 6: 3968- 3979). During modification of -NH2 groups with acyl such as acetyl, the acyl moiety may be occasionally also introduced on the -OH group of Ser and/or Thr. Such ester bonds are preferably subsequently broken by alkali hydrolysis at conditions that do not effect the acylation of the -NH2 groups. In embodiments, the blocking step performed on the protein mixture (PM) should block at least the N-terminal α-NH2 groups thereof, such as to introduce a charge difference between the N-terminus of the Cα-blocked N-terminal peptides and the free α-NH2-containing N- termini of internal and C-terminal peptides as generated during cleavage. Preferably, said blocking step may also protect any side chain primary amino groups such as Lys ε-NH2 groups in the protein mixture (PM), which may allow to isolate as well the C-terminal peptides containing a Lys, thereby increasing the representation of the parent proteins. Blocking reagents such acetyl N-hydroxysuccinimide are capable of blocking both α-NH2 and side chain amino groups. A protein peptide mixture may be obtained by fragmentation of a mixture of proteins, such as, e.g., by fragmentation of all or a fraction of proteins present in and/or isolated from a biological sample after the sample has been removed from biological source.
The invention in particular analyses N- and/or C-terminal peptides of proteins. To ensure optimal resolution and characterisation of said peptides, it is desirable that substantially all N- terminal peptides (or C-terminal peptides) generated by fragmentation from individual molecules of a given protein have the same length, i.e., that fragmentation generating such N-terminal peptides (or C-terminal peptides) occurs at the same peptide bond in substantially all individual molecules of said protein.
This can be advantageously achieved when the mixture of proteins (PM) is fragmented preferentially at peptide bonds adjacent to one or more specific amino acid residue types (denoted as X1... Xπ). A peptide bond adjacent to a given amino acid residue may be the N- terminally adjacent peptide bond, or the C-terminally adjacent peptide bond.
Preferably, a protein mixture (PM) will be fragmented at substantially all recited peptide bonds. Hence, the fragmentation would occur substantially quantitatively at peptide bonds N- terminally or C-terminally adjacent to amino acid residues of the one or more types X1... Xπ.
To achieve a protein peptide mixture (PPM) displaying preferred average and/or median peptide lengths, the protein mixture (PM) may be advantageously fragmented adjacent to a relatively small number of amino acid residue types X1... Xπ, such as at peptide bonds adjacent to 5 or less amino acid residue types (i.e., n<5), more preferably n<4, even more preferably n<3, still more preferably n<2, or preferably at peptide bonds adjacent to only 1 amino acid residue type (i.e., n=1 ). The one or more specific amino acid residue types X1... Xn adjacent to which fragmentation is contemplated herein may be selected from any amino acid residues, including but not limited to amino acids found in naturally occurring proteins, amino acids carrying a co- or post- translational modification, amino acids including a non-natural isotope, or amino acids further chemically and/or enzymatically altered prior to the fragmentation, etc.
A suitable frequency of cleavage may be preferably achieved when the fragmentation takes place adjacent to one or more of the 20 common amino acid residue types found in natural proteins and/or adjacent to one or more of residue types obtained from any of the 20 common amino acid residue types by suitable modification of the starting proteins. Accordingly, in a preferred embodiment, the mixture of proteins (PM) is fragmented preferentially at peptide bonds adjacent to one or more amino acid residue types X1... Xπ chosen from the group consisting of: GIy, Pro, Ala, VaI, Leu, lie, Met, Cys, Phe, Tyr, Trp, His, Lys, Arg, GIn, Asn, GIu, Asp, Ser and Thr; optionally including a co- or post-translational modification, chemically and/or enzymatically altered prior to the fragmentation, or including a non-natural isotope, etc. Following fragmentation of the protein mixture (PM), N-terminal and/or C-terminal peptides are isolated from the resulting protein peptide mixture (PPM).
Isolation of N-terminal and/or C-terminal peptides from the protein peptide mixture (PPM) requires that the majority of or preferably substantially all N-terminal and/or C-terminal peptides are distinct from the majority of or preferably substantially all remaining peptides of the protein peptide mixture (PPM) with respect to one or more physical and/or chemical properties. Herein, N-terminal and/or C-terminal peptides are isolated from the remaining peptides of a protein peptide mixture (PPM) particularly on the basis of dissimilar net charge.
Techniques capable of separating peptides on the basis of net charge are generally known in the art and include without limitation ion exchange chromatography and zwitterionic ion exchange chromatography (see, e.g., WO 00/27496), chromatofocusing and various electrophoretic techniques such as inter alia isoelectric focusing.
Preferred techniques for use herein encompass ion exchange chromatography, including cation or anion exchange chromatography, preferably including strong cation exchange (SCX) or strong anion exchange (SAX) chromatography. The choice of particular separation method may be made by a skilled person based on the expected difference between the net charges of N-terminal and/or C-terminal peptides of interest vis-a-vis the remaining peptides of a protein peptide mixture (PPM).
Non-limiting embodiments "Ea" to "Ee" contemplate preferred manners of endowing majority of or substantially all N-terminal and/or C-terminal peptides with a net charge distinct from the majority of or substantially all remaining peptides of the protein peptide mixture (PPM).
In a preferred embodiment "Ea", the protein mixture (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to one or more specific amino acid residue types X1...Xπ that are basic. When the protein mixture (PM) is so-fragmented, most C-terminal peptides will not include a basic amino acid (unless a basic amino acid was the actual C-terminal residue of the respective protein), whereas most or all N-terminal and internal peptides will comprise a basic amino acid as their last residue. Therefore, under conditions where the side chain moiety of said basic amino acids is protonated, the net charge of C-terminal peptides will in general be lower than the net charge of N-terminal and internal peptides. This general difference between the net charge of C-terminal peptides vis-a-vis the remaining peptides allows for isolating or enriching the C-terminal peptides from the protein peptide mixture (PPM).
Particularly suitable conditions under which the C-terminal peptides may be enriched are where the majority of or substantially all α-C(=O)OH groups of the peptides are dissociated, i.e., -C(=O)O"; the majority of or substantially all α-NH2 groups of the peptides are protonated, i.e., -NH3 +; and the basic side chain moiety of the majority of or substantially all basic amino acids adjacent to which the protein mixture (PM) was fragmented are protonated. Under said conditions, C-terminal peptides will in general have about zero net charge, while N-terminal and internal peptides will in general have net charge of about +1. Preferably, the conditions may also be such that the acidic side chain moiety of the majority of or substantially all acidic amino acids present in the peptides (particularly the -COOH moiety of Asp and GIu) is not dissociated. This can reduce the confounding effect of such acidic moieties on the method. Conditions as above may preferably encompass pH of about 4.0 or lower, preferably of about 3.0 or lower, such as, e.g., between 2.5 and 4.0, more preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.5, e.g., between 2.75 and 3.25, and still more preferably about 3, such as, e.g., 2.80, 2.85, 2.90, 2.95, 3.0, 3.05, 3.10, 3.15 or 3.20. Under above conditions, C-terminal peptides can be preferably isolated using cation exchange chromatography. Herein, positively charged peptides, most notably N-terminal and internal peptides, will be more strongly retained by the stationary phase, whereas substantially uncharged peptides, most notably C-terminal peptides, will elute faster and can be recovered from eluate. Preferably, C-terminal peptides can be isolated using SCX.
In an advantageous development of the embodiment "Ea" (herein termed embodiment "Eaa"), the α-NH2 group of proteins in the protein mixture (PM) may be blocked prior to fragmentation, e.g., acylated preferably acetylated, to prevent protonation thereof. Consequently, N-terminal peptides produced by fragmentation C-terminally adjacent to basic amino acid residues will, under the above conditions, generally display net charge similar to C-terminal peptides (since the blocked α-NH2 group of N-terminal peptides is not charged). Co-isolation of N- and C-terminal peptides from the protein peptide mixture (PPM) is thus possible thereby potentially increasing the confidence of subsequent protein identification.
Fragmentation of protein mixture (PM) may take place C-terminally adjacent to all basic amino acid residue types, preferably to all Arg, Lys and His (n=3, X1=Arg, X2=Lys, X3=His), or C-terminally adjacent to only some basic amino acid types, such as to only Arg (n=1 , X1=Arg), or to only Lys (n=1 , X1=Lys), or to only Arg and Lys (n=2, X1=Arg, X2=Lys). Where fragmentation does not take place C-terminally adjacent to each basic amino acid residue type, the charge of some or all basic amino acids after which the fragmentation does not occur may be preferably neutralised by suitable side chain modification - such as, e.g., by acetylation of Lys, by modification of His by diethylpyrocarbonate, or by modification of Arg by phenylglyoxal - such that the presence of said amino acids in the peptides does not alter the overall net charge of the latter.
Preferred embodiments, wherein fragmentation by trypsin or a trypsin-like protease occurs C- terminally adjacent to the basic amino acids Arg and Lys (if not blocked), are detailed in the Summary section.
Accordingly, in a particularly preferred embodiment "Eb" a protein mixture (PM) is fragmented by trypsin or a trypsin-like protease to yield a protein peptide mixture (PPM). Then, peptides having about zero net charge under conditions wherein the majority of or substantially all α-C(=O)OH groups of the peptides of the protein peptide mixture (PPM) are dissociated; the majority or substantially all α-NH2 groups of said peptides are protonated; and the majority of or substantially all Lys and Arg side chains of said peptides are protonated are isolated from the protein peptide mixture (PPM). So isolated peptides mostly represent C- terminal peptides derived from the parent proteins. Preferably, the peptides are isolated using cation exchange chromatography, more preferably SCX. Preferably, the peptides are isolated at pH as recited above, particularly at pH between 2.5 and 4.0, more preferably about 3.
In another particularly preferred embodiment "Ec", at least the α-NH2 groups, and possibly also the side chain primary amino groups particularly including the ε-NH2 groups of Lys, in the proteins of a protein mixture (PM) are blocked, preferably acylated, more preferably acetylated, and the so-modified protein mixture (PM) is fragmented by trypsin or a trypsin-like protease to yield a protein peptide mixture (PPM). It is understood that if ε-NH2 groups of Lys are blocked, trypsin will usually not cleave there after. Then, peptides having about zero net charge under conditions wherein the majority of or substantially all α-C(=O)OH groups of the peptides of the protein peptide mixture (PPM) are dissociated; the majority or substantially all α-NH2 groups of said peptides are protonated; and the majority of or substantially all Arg and Lys (if not blocked) side chains of said peptides are protonated are isolated from the protein peptide mixture (PPM). So isolated peptides mostly represent N- and C-terminal peptides derived from the starting proteins. Preferably, the peptides are isolated using cation exchange chromatography, more preferably SCX. Preferably, the peptides are isolated at pH as described above, particularly at pH between 2.5 and 4.0, more preferably about 3.
In another embodiment "Ed", the protein mixture (PM) may be fragmented preferentially at peptide bonds C-terminally adjacent to one or more specific amino acid residue types X1...Xn that are acidic. The term "acidic amino acid" generally refers to amino acids, particularly α-L- amino acids, wherein the dissociation constant pKA of their side chain is < 5, preferably < 4 or lower. Particular acidic amino acids include Asp and GIu, which comprise side chain carboxyl moiety.
When the protein mixture (PM) is fragmented preferentially at peptide bonds C-terminally adjacent to acidic amino acid residues, most C-terminal peptides will not include an acidic amino acid (unless an acidic amino acid was the actual C-terminal residue of the respective protein), whereas most or all N-terminal and internal peptides will comprise an acidic amino acid as their last residue. Therefore, under conditions where the side chain moiety of said acidic amino acids is dissociated, the net charge of C-terminal peptides will in general be higher than the net charge of N-terminal and internal peptides. Preferably, the conditions may also be such that the basic side chain moiety of the majority of or substantially all basic amino acids present in the peptides (particularly Lys, Arg and His) is not protonated; this can reduce the confounding effect of such basic moieties on the method. This general difference between the net charge of C-terminal peptides vis-a-vis the remaining peptides allows for isolating or enriching the C-terminal peptides from the protein peptide mixture (PPM), e.g., by SAX.
When isolation of N-terminal peptides is intended, methods involving blocking of the α-NH2 group of proteins before fragmentation (such as, e.g., in above embodiments "Eaa" or "Ec") may be preferred, since they also recover N-terminal peptides from proteins in which the α- NH2 group is acetylated or otherwise blocked in nature, thereby improving representation of the parent proteins.
In yet further embodiment "Ee", the α-NH2 group and/or the α-COOH group of proteins in the protein mixture (PM) may be modified to introduce thereon a moiety having one or more positive charges (e.g., a basic moiety) or one or more negative charges (e.g., an acidic moiety such as sulphonate moiety) wherein said charges are present at least under conditions of subsequent peptide separation, e.g., using ion exchange chromatography. Following fragmentation of the so-modified mixture (PM), the newly generated free α-NH2 groups and/or α-COOH groups may be optionally and preferably modified to introduce thereon a moiety having one or more charges opposite to those added on the α-NH2 group and/or α-COOH of the N-terminal and/or C-terminal peptides. The general difference between the net charge of N-terminal and/or C-terminal peptides vis-a-vis the remaining peptides, due to the addition of the charged moieties, allows for isolating or enriching the N-terminal peptides and/or C-terminal peptides from the protein peptide mixture (PPM). Advantageously, introduction of charged moieties onto N-terminal and/or C-terminal peptides may ensure better ion fragmentation of so-charged peptide species (e.g., as compared to acetylated N- terminal peptides) during MS analysis.
In an embodiment, the charged moiety introduced onto N-terminal and/or C-terminal peptides may be a weak base or a weak acid moiety. By adequate choice of solvent pH, such moieties may be endowed with charge when required (e.g., to enable ion exchange-based peptide sorting as above) but may be kept uncharged if presence of such charge would be undesired in other separation steps. Inclusion of a weak base moiety can be particularly advantageous since it can greatly facilitate ion fragmentation of such peptides during MS. In another embodiment, the introduced charged moiety may be a strong base or a strong acid moiety, which maintains its charge substantially irrespective of the solvent pH. It shall be appreciated that the above described methods for isolating N-terminal and/or C- terminal peptides from a protein peptide mixture (PPM) may be complemented with further approaches. By means of illustration and not limitation, N-terminal and C-terminal peptides may be enriched on the basis of their distinct net charge as described in embodiments "Eaa" or "Ec". Subsequently, N-terminal peptides may be further isolated by removing peptides containing a free α-NH2 group (which mostly include C-terminal peptides and non-tryptic peptides) using affinity separation or chromatography with capture agents having affinity to primary amines. Capture agents having strong affinity for primary amino groups, in particular protonated primary amino groups, include without limitation crown ethers, such as, e.g., 18- crown-6 ether or derivatives thereof. In an embodiment, the isolated N-terminal and/or C-terminal peptides are subsequently separated into fractions of peptides by a two- or more-dimensional (multidimensional) separation process, as described in the Summary section.
In a preferred embodiment, one or more or all separation steps of the multidimensional separation process may be by chromatography (i.e., multidimensional chromatography). In preferred embodiments, the separation process may be multidimensional chromatography, such as, e.g., 4D-chromatography, 3D-chromatography or two-dimensional chromatography, preferably orthogonal chromatography.
In present methods, chromatography preferably employs liquid mobile phase. In a preferred embodiment, the chromatography may be columnar, i.e., wherein the stationary phase is deposited or packed in a column. In a preferred embodiment, the chromatography is HPLC. Columns and conditions for performing HPLC separation are generally known to the skilled person, and described in, e.g., Practical HPLC Methodology and Applications, Bidlingmeyer, B. A., John Wiley & Sons Inc., 1993.
Stationary phase for use in chromatography may commonly comprise solid support functionalised with one or more moiety types intended for interaction with analytes and/or for allowing formation of a liquid stationary phase film on the support. The requirements for solid supports for use in separation methods including chromatography are generally known in the art, being solid materials that are structurally stable and chemically inert under conditions of separation and which exhibit low or no non-specific interactions with analytes. Solid supports should allow for the immobilisation thereon of one or more functionalising moieties. Methods for immobilisation of moieties of interest to solid supports, and optionally the choice of spacers or linkers therefore, are well known in the field; see, e.g., Immobilized Affinity Ligand Techniques, Hermanson, G. T. et al, Academic Press, INC, 1992; Combinatorial Chemistry, Eds: Bannwarth, Willi, Hinzen, Berthold, Wiley-VCH.
Solid supports may be made from organic or inorganic materials or hybrid organic/inorganic materials, and may be polymer-based materials. Non-limiting examples of solid supports include ones prepared from a native polymer, such as cross-linked carbohydrate material, include, e.g., agarose, agar, cellulose, dextran, chitosan, konjac, carrageenan, gellan, alginate, etc.; or ones prepared from a synthetic polymer or copolymer, such as cross-linked synthetic polymers, e.g., styrene or styrene derivatives, divinylbenzene, acrylamides, acrylate esters, methacrylate esters, vinyl esters, vinyl amides, etc.; or solid supports prepared from an inorganic polymer, such as silica, which is particularly suitable for inter alia HPLC.
Inorganic porous and non-porous supports are well known in this field, some of which are commercially available. Examples of commercially available matrix materials include, but are not limited to, those based on silica, polystyrene, POROS®, sepharose®, sepharopore™, and other variants thereof. A skilled person can choose suitable solid support material based on the type of separation, expected unwanted non-specific interactions, capacity, loadability and flow characteristics, etc.
A solid support can be in the form of, e.g., beads, pellets, resin, small particles, a membrane, a frit, a sintered cake, pillars in microfabricated structures or a monolith or any other form desirable for use. The solid support particles can have, for example, a spherical shape, a regular shape or an irregular shape. Depending on the type of separation or type of chromatography, suitable particle sizes may be in the diameter range of about 1-500 μm, such as about 2-200 μm or about 5-100 μm, e.g., about 5-50 μm. Size of particles for use in HPLC may preferably be in the diameter range of about 1-10 μm preferably about 5 μm. The skilled person in this field can easily choose suitable particle sizes and porosity depending on the process to be used. Depending on the type of separation, solid supports may be comprised in a chromatography column as a chromatography matrix, in a phase extraction cartridge (SPE), in a magnetic bead, in a centrifugable or filterable bead or in any other known format suitable for separations.
In an embodiment, one or more chromatographic separation steps may involve reversed phase (RP) liquid chromatography, preferably RP-HPLC. Exemplary stationary phases for RP chromatography may include appropriate solid supports (e.g., porous or non-porous silica) functionalised with moieties such as listed in the Summary section. Commercially available chromatography columns functionalised with moieties suitable for RP-HPLC may be used in the present method, such as without limitation ones summarised in Table 3:
Table 3.
Figure imgf000089_0001
Dionex Corp. (Sunnyvale, California), Phenomenex Corp. (Torrance, California), Agilent Technologies (Santa Clara, California), Waters Corp. (Milford, Massachusetts).
Typically, in RPLC the loading mobile phase is aqueous in nature comprising a (low) percentage of organic modifier (e.g., ACN or methanol). The skilled person will be aware of the percentages of added modifier, the applied flow rates, temperatures, etc. used in RPLC. After loading, the peptides are separated using a solution comprising constant or gradually increasing (gradient) percentages of a water miscible solvent with hydrophobic properties such as acetonitrile (ACN), an alcohol (e.g. methanol, ethanol) or other solvents known in the art of reversed phase separation. In an embodiment, one or more chromatographic separation steps may involve hydrophilic interaction chromatography (HILIC), such as ZIC-HILIC. Exemplary stationary phases for HILIC chromatography may include appropriate solid supports (e.g., porous or non-porous silica) functionalised with moieties such as listed in the Summary section.
Commercially available chromatography columns functionalised with moieties suitable for HILIC may be used in the present method, such as without limitation ones summarised in Table 4:
Table 4.
Figure imgf000090_0001
Sequant AB (Umea, Sweden), Tosoh Bioscience (Tessenderlo, Belgium), PoIyLC Inc. (Columbia, Maryland)
Typically, in HILIC the loading mobile phase is hydrophobic in nature (e.g., ACN) comprising a (low) percentage of water in order to generate hydrophilic stationary phase. The skilled person will be aware of the percentages of water present, the applied flow rates, temperatures, etc. used in HILIC. After loading, the peptides are separated using a solution comprising constant or gradually increasing (gradient) percentages of a water or buffer with hydrophilic properties.
The inventors have realised that the high percentage of hydrophobic solvent (e.g., ACN) needed to load HILIC columns may precipitate some peptides. Hence, in an advantageous development, the inventors have realised a manner to load peptides onto HILIC columns by adjoining in-line and upstream thereto a short RPLC column (e.g., a short C18 column). The length of said short column may be, e.g., less than about 5 cm, e.g., less than about 4 cm, more preferably less than about 3 cm, e.g., less than about 2.5 cm, even more preferably less than about 2 cm, such as, e.g., between about 0.5 cm and about 2 cm, more preferably between about 1 cm and about 2 cm. Peptides are first loaded onto the RPLC column in a predominantly aqueous solution (e.g., between about 80% and about 100% aqueous solution, preferably between about 90% and 100% and more preferably about 100% aqueous solution) whereby the peptides remain bound to the RPLC stationary phase. Following loading, the peptides are eluted from the RPLC column onto the in-line downstream HILIC column using a highly hydrophobic solvent (e.g., at least 70% hydrophobic, more preferably at least 80% or at least 85% hydrophobic solvent, such as, e.g., about 85% ACN). Given that highly hydrophobic solvents are strong eluents in RPLC but weak eluents in HILIC, the peptides will focus onto the HILIC columns and may be thereafter suitably separated thereon.
Table 5 lists several preferred but non-limiting setups of two-dimensional chromatography preferred as the multidimensional separation process of the invention, which display particular orthogonal properties and effective resolution of the N-terminal and/or C-terminal peptides:
Table 5
Figure imgf000091_0001
As mentioned, RP-HPLC at low pH may be preferably used as 2nd (or ultimate) dimension due to its advantageous compatibility with downstream MS analysis. Similarly, HILIC may be used as 2nd (or ultimate) dimension due to MS compatibility.
In another embodiment, the isolated N-terminal and/or C-terminal peptides are subsequently separated into fractions of peptides by a 1 D long-column chromatography separation, preferably liquid chromatography, more preferably HPLC.
1 D long-column chromatography may use stationary phases, mobile phases, solid supports, functionalising moieties, etc. as described above in relation to multidimensional chromatography, with the distinction that the length of the column is increased. In preferred embodiments, 1 D long-column chromatography may involve RPLC, preferably RP-HPLC, as taught above, such as without limitation C18 or phenyl-based RPLC. In another embodiment, 1 D long-column chromatography may involve hydrophilic interaction chromatography (HILIC), such as ZIC-HILIC, optionally with RPLC pre-loading (e.g., C18 pre-loading), as described above. Preferably, in order to be used in a long column format, chromatographic beds may need to be characterized by an inherent high permeability and/or ability to withstand high pressures and/or temperatures. Particularly preferably, silica monolithic beds fulfil the former requirement. By means of suitable example and not limitation, Zorbax Stable Bond particles (Agilent) can be used at temperatures up to 900C and can tolerate relative high pressures. To reach the desired column length, commercially available columns can be coupled or columns can be constructed in one piece utilizing commercially available particles.
The methods and systems of the invention, and particularly of the second group of aspects of the invention, find particular use in proteomics applications. The N-terminal and/or C-terminal peptides recovered using the above methods are highly representative of and can thus identify the corresponding proteins in a starting sample.
In a preferred approach, separation, analysis and/or identification of peptides resolved herein may be performed using a mass spectrometer. Otherwise, said peptides may be analysed and/or identified using other methods such as, e.g., activity measurement in assays, analysis with specific antibodies, Edman sequencing, etc.
In an embodiment, peptides released (e.g., eluted) from the final step of the multidimensional separation process can be directly (on-line) fed to an analyser (such as, e.g., on-line LC/MS/MS). Otherwise, peptides may be collected in fractions which, optionally following additional manipulation (e.g., concentration and/or spotting onto a MALDI-matrix; or advantageously, mixing with matrix in a microtee prior to deposition on MALDI targets, thereby eliminating the need for concentration and manual spotting; etc.), can be fed to an analyser.
In a preferred embodiment, peptides resolved herein are analysed and identified using mass spectrometry, preferably high-throughput mass spectrometric (MS) techniques known per se, that can obtain precise information on the mass of the peptides and preferably also on (partial) amino acid sequence of the peptides (e.g., in tandem mass spectrometry, MS/MS; or in post source decay TOF MS). Such information can be employed in database searching to trace the peptides back to their parent proteins.
MS arrangements and instruments appropriate for peptide analysis are commonly known and may include, without limitation, matrix-assisted laser desorption/ionisation time-of-flight
(MALDI-TOF) MS systems; MALDI-TOF post-source-decay (PSD) systems; MALDI-TOF/TOF systems; electrospray ionisation (ESI) 3D or linear (2D) ion trap MS systems; ESI triple quadrupole MS systems; ESI quadrupole orthogonal TOF systems (Q-TOF); or ESI Fourier transform MS systems; etc. Peptide ion fragmentation in tandem MS (MS/MS) may be achieved using manners established in the art, such as, e.g., collision induced dissociation (CID).
Algorithms and software exist in the art that compare experimental mass spectra and optionally also (partial) sequence information for the analysed peptides with a database of peptide masses/sequences predicted on the basis of sequencing information in protein and nucleic acid databases, and identify the corresponding peptides: e.g., ProFound, X! Tandem, (http://prowl.rockefeller.edu), MASCOT (http://www.matrixscience.com, Matrix Science Ltd. London), Sequest (http://fields.scripps.edu/sequest/; US 6,017,693; US 5,538,897), OMSSA (http://pubchem.ncbi.nlm.nih.gov/omssa/), etc. Starting from the known identity of so-detected peptides, the corresponding proteins can be easily found by sequence database searching using these or other software tools. Identification of N-terminal peptides can also benefit from the use of specialised N-terminally ragged databases to account for protein processing, as known in the art (e.g., Gevaert et al. 2003. Nat Biotechnol 21 : 566-569; Martens et al. 2005. Proteomics 5: 3139-3204).
Generally, the herein disclosed methods may achieve identification of any number or even substantially all (i.e., comprehensive analysis) N- and/or C-terminal peptides present in starting protein peptide mixtures (PPM). Optionally, the methods may further encompass art established technique(s) allowing to determine the relative or absolute quantity of one or more proteins in the starting sample (see, e.g., WO 03/016861 , WO 02/084250 or WO 2004/111636).
In a preferred embodiment, the methods and systems of the present invention may be employed to identify proteins differentially present between samples, more preferably biomarkers.
"Marker" or "biomarker" as used herein refer to a protein or polypeptide which is differentially present in a sample taken from subjects having a genotype or phenotype of interest and/or who have been exposed to a condition of interest (herein "query sample"), as compared to an equivalent sample taken from control subjects not having said genotype or phenotype and/or not having been exposed to said condition (herein "control sample"). Samples can be as disclosed above and may be broadly applied to compare for instance subcellular fractions, cells, tissues, biological fluids (e.g., nipple aspiration fluid, saliva, sperm, cerebrospinal fluid, urine, blood, serum, plasma, synovial fluid), organs and/or complete organisms.
A particularly relevant phenotype may be a pathological condition of interest in patients, such as, e.g., cancer, an inflammatory disease, autoimmune disease, metabolic disease, CNS disease, ocular disease, cardiac disease, pulmonary disease, hepatic disease, gastrointestinal disease, neurodegenerative disease, genetic disease, infectious disease or viral infection; vis-a-vis the absence of such conditions in healthy controls. Other comparisons may be envisaged between samples from, e.g., stressed vs. non-stressed conditions/subjects, drug-treated vs. non drug-treated conditions/subjects, benign vs. malignant diseases, adherent vs. non-adherent conditions, infected vs. uninfected conditions/subjects, transformed vs. untransformed cells or tissues, different stages of development, conditions of overexpression vs. normal expression of one or more genes, conditions of silencing or knock-out vs. normal expression of one or more genes, and so on.
The phrase "differentially present" refers to a demonstrable, preferably statistically significant, difference in the quantity and/or frequency of a protein or polypeptide in query samples as compared to control samples. For example, a marker may be a protein which is present at an elevated level or at a decreased level in query samples compared to control samples. A marker may also be a protein which is detected at a higher frequency or at a lower frequency in query samples compared to control samples. For example, a protein may be differentially present between two samples if the protein's quantity in one sample is at least about 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900% or at least about 1000% of its quantity in the other sample; or if it is detectable in one sample but not detectable in the other sample. Alternatively or additionally, a protein may be differentially present between two sets of samples if the frequency of detecting the protein in one set of samples is at least about 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900% or at least about 1000% of the frequency of detecting the protein in the other set of samples; or if the protein is detectable at a given frequency in one set of samples but is not detected in the second set of samples. The present methods may be employed to identify proteins differentially present between query and control samples, thereby identifying potential biomarkers.
In an embodiment, query samples and control samples may be analysed separately and abundances of corresponding peptides may be subsequently compared there between. This is generally known in the art as label-free profiling.
Preferably, to reduce variance between the to-be-compared samples, the samples may be analysed in the same separation experiment insofar peptides derived from such samples are differentially labelled allowing to attribute a given readout to one of the starting samples. For example, samples (typically two samples) can be treated so that peptides derived from one sample contain one isotope and peptides obtained from the other sample contain another isotope of the same element. Such differentially-labelled samples may be analysed in the same separation experiment. The mass difference caused by the presence of other isotopes allows to distinguish - and compare the relative intensity of - peaks corresponding to equivalent peptides from the differentially-labelled samples on MS. Hence, in an embodiment the protein peptide mixture (PPM) to be analysed according to the methods of the invention may be prepared by combining, preferably in equal amounts:
- a first protein peptide mixture (PPM1 ) derived from a first sample (e.g., a query sample), the peptides of mixture PPM 1 being labelled with a first isotope; and
- a second protein peptide mixture (PPM2) derived from a second sample (e.g., a control sample), the peptides of mixture PPM2 being labelled with a second isotope different from the first isotope.
After resolving and analysing the peptides of the protein peptide mixture (PPM) using methods of the invention, one or more N-terminal and/or C-terminal peptides differentially present between the first and second samples can be identified by comparing the peak heights or areas of identical but differentially isotopically labelled peptides. The identity of the isolated peptide and its corresponding protein - potentially representing a biomarker - can then be determined.
The invention thus also provides a method for identification of proteins differentially present between a first protein mixture (PM1 )(such as, e.g., a protein mixture from a query sample or a pool thereof) and a second protein mixture (PM2) (such as, e.g., protein mixture from a control or reference sample or a pool of any thereof) comprising the steps: (a) fragmenting the first protein mixture PM1 to obtain a first protein peptide mixture (PPM1 ) and fragmenting the second protein mixture PM2 to obtain a second protein peptide mixture (PPM2);
(b) labelling the first protein peptide mixture PPM 1 with a first isotope and labelling the second protein peptide mixture PPM2 with a second isotope different from the first isotope;
(c) combining the protein peptide mixtures PPM 1 and PPM2 from (b) and isolating from said combined peptide mixtures:
(ca) N-terminal peptides of proteins of the protein mixtures PM1 and PM2, and/or
(cb) C-terminal peptides of proteins of the protein mixtures PM1 and PM2; (d) separating the isolated N-terminal and/or C-terminal peptides into fractions of peptides via a multidimensional separation process or via one-dimensional long-column chromatography; and
(e) identifying one or more N-terminal and/or C-terminal peptides which are differentially present in the protein peptide mixture PPM 1 as compared to the protein peptide mixture PPM2, whereby said identified N-terminal and/or C-terminal peptides represent one or more proteins differentially present between the protein mixtures PM1 and PM2.
It shall be appreciated that preferred features disclosed in embodiments of this specification also apply to the above method.
The differential isotopic labelling of peptides in the first and second samples can be done in many different ways available in the art. A key element is that a particular peptide originating from the same protein in a first and second sample is identical, except for the presence of a different isotope in one or more amino acids of the peptide. Examples of pairs of distinguishable isotopes are 12C and 13C, 14N and 15N or 16O and 18O. Peptides labelled with such isotopes are chemically very similar, separate chromatographically in the same manner and also ionise in the same way. However, when fed into an analyser, such as MS, they will segregate into the distinguishable light and heavy peptide. The results of the mass spectrometric analysis of isolated peptides will thus be a plurality of pairs of closely spaced twin peaks, each twin peak representing a heavy and a light peptide. The ratios (relative abundance) of the peak intensities of the heavy and light peak in each pair are then measured. These ratios give a measure of the relative amount (differential presence) of that peptide (and its corresponding protein) in each sample. The peak intensities can be calculated in a conventional manner (e.g., by calculating the peak height or peak surface). Incorporation of isotopes into peptides can be obtained in multiple ways. In one approach proteins are labelled by growing cells in media supplemented with an amino acid containing the different isotopes (SILAC; see, e.g., in Ong et al. 2002 (MoI Cell Proteomics 1 (5): 376-86).
In a preferred embodiment, the different isotopes can be incorporated by an enzymatic approach. For instance, labelling can be carried out by treating one sample comprising proteins with trypsin in H2 16O and the second sample comprising proteins with trypsin in H2 18O. Trypsin incorporates two oxygens of water at the COOH-termini of the newly generated sites during cleavage. Alternatively, treating protein peptide mixture post-digestion with trypsin in H2 16O or H2 18O leads to incorporation of one oxygen (16O or 18O, respectively) at the COOH-termini of the component peptides (see, e.g., US 2006/105415). To prevent back-exchange of the label after mixing 16O- and 18O-labelled samples, the samples and/or mixture thereof may be acidified, e.g., to pH less than about 5, more preferably less than about 4, even more preferably to pH about 3. In a preferred embodiment, the labelled samples may be added to an already acidic solution, such that after sample addition desired acidic pH is attained and back-exchange is immediately prevented. Preferably, the acidification may be with TFA (trifluoroacetic acid) which is particularly compatible with downstream SCX sorting. Acidification and particularly TFA-mediated acidification provides an advantageous, SCX-compatible alternative to guanidinium HCI/TCEP/IAA extraction used to inactivate trypsin and prevent back-exchange in the art, which conditions are considerably less SCX-compatible.
In another embodiment, differential isotopes can be incorporated into peptides by chemical labelling reactions known in the art. For example, peptides can be changed by Shiffs-base formation with deuterated acetaldehyde followed by reduction with normal or deuterated sodiumborohydride. This reaction, which is known to proceed in mild conditions, may lead to the incorporation of a predictable number of deuterium atoms. Peptides will be changed either at the α-NH2-group, or ε-NH2 groups of lysines or on both. Similar changes may be carried out with deuterated formaldehyde followed by reduction with NaBD4, which will generate a trideutero-methylated form of the amino groups. The reaction with formaldehyde could be carried out either on the total protein, incorporating deuterium only at lysine side chains or on the peptide mixture, where both the α-NH2 and lysine-derived NH2-groups will be labelled. In a further embodiment, the samples may be differentially labelled using the iTRAQ technology with isobaric reagents that tag amine groups, essentially as taught in Ross et al. 2004 (MoI Cell Proteomics 3(12): 1154-69). These tags are preferably used in conjunction with tandem MS mode in which each tag generates a unique reporter ion. Having identified suitable biomarkers, the methods of the invention may also be employed in a diagnostic mode to detect the presence, absence or a variation in expression level of one or more biomarkers or a specific set of proteins indicative of a disease state (e.g., such as cancer, neurodegenerative disease, inflammation, cardiovascular diseases, viral infections, bacterial infections, fungal infections or any other disease) in a sample. The above aspects of the invention are further illustrated with examples that are not to be considered limiting.
Example 1 : Flow chart of isolation of peptides comprising C-terminal ends of proteins
Figure 1 shows a flow-chart of an exemplary method for isolating peptides comprising C- terminal ends of proteins from a protein peptide mixture. Preparation of peptides in the first step involves acetylation of ε-NH2 groups of lysines and of N-terminal α-NH2 groups, as well as alkylation of -SH groups of cysteines.
Example 2: Reaction of arginine side chain with hydroxy phenylglyoxal
Protein peptide mixture (synthetic or trypsin digest) is brought in 100 mM sodium pyrophosphate to pH 9 with NaOH, p-hydroxyphenylglyoxal is added at 100 fold molar excess, in same buffer at pH 9, the reaction is allowed to proceed in the dark at room temperature for 3 hours, subsequently the reaction is quenched by desalting. The effectiveness of arginine modification can be assessed by the measurement of absorbance at 340 nm (A340) to give moles of L-Arg modified (at pH 9, e = 18,300M-1/cm). Figure 2 depicts the stoichiometry of the reaction of p-hydroxyphenylglyoxal with the guanidino group of an arginine residue in a peptide chain (R, R depict in this figure the remaining portions of the peptide chain).
Example 3: Reaction of arginine side chain with nitromalondialdehyde
Protein peptide fractions from RP-HPLC (4 min wide) are dried and re-dissolved in 100 mM NaOH to which 3 mg of NMA (sodium salt) is added. Modification reaction is allowed to proceed for 2 hours at 300C, and is subsequently quenched with 50 μl 20OmM acetic acid. Figure 3 depicts the stoichiometry of the reaction of NMA with the guanidino group of an arginine residue in a peptide chain (R, R depict in this figure the remaining portions of the peptide chain).
Example 4: Pilot proteomic analysis of C-terminal peptides using NMA modification
Protein mixture isolated from a total cell lysate was processed for analysis and examined as follows:
- proteins of the protein mixture were modified by alkylation of cysteine residues and trideuteroacetylation of free amino groups (thereby preventing trypsin cleavage after lysine residues, and neutralising the positive charge of lysine residues, as well as neutralising the positive charge of free N-terminal α-NH2 groups); the so-modified protein mixture was digested with trypsin;
- the resulting protein peptide mixture was loaded onto SCX column at pH = 3;
SCX flow-through fraction, containing N-terminal, C-terminal and blocked internal peptides, was collected; free amines were acetylated to prevent their side-reaction with NMA, methionine residues were oxidised to their sulfoxide; the peptide mixture was separated by RP-HPLC (1st separation) and fractions of 4 min wide were collected; - these collected "primary" fractions were dried and re-dissolved in 100 mM NaOH to which
3 mg of nitromalondialdehyde (NMA) (MW = 157.06) was added; the modification reaction was allowed to proceed for 2 h at 300C; subsequently, the reaction was stopped with 50 μl of 200 mM acetic acid;
- peptides of each so-reacted primary fraction were separated (2nd separation) under conditions identical to the 1st separation; 10 "secondary" fractions were collected from the
2nd separation of each primary fraction, including eight 30 second-wide fractions (said 8 fractions - numbered #1 to #8 - covering the whole collection interval of the respective primary fraction interval) and 2 "post fractions" of 1 min each, numbered #9 and #10; so-isolated secondary fractions of two consecutive primary fractions (primary fractions with retention times 36-40 min and 40-44 min, herein denoted fraction "36-40" and "40-
44", respectively) were analysed by LC-MS/MS (XCT-Agilent) to identify constituent peptides. From said 20 secondary fractions corresponding to the two primary fractions "36-40" and "40- 44", a total of 213 peptides were identified, of which 100 corresponded to unique primary amino acid sequences of peptides. Of said 213 identified peptides, 124 (corresponding to 61 unique peptide sequences), i.e., 58%, were C-terminal peptides, while 89 (corresponding to 39 unique peptide sequences), i.e., 42%, were non-C-terminal peptides containing a -COOH arginine. Out of said 89 -COOH arginine containing peptides, 76 (i.e., 85%) were modified with NMA, while only 13 (15%) were not modified with NMA.
This demonstrates that the methods of the invention can effectively identify C-terminal peptides from protein mixtures: identification of as many as 61 unique C-terminal peptides from a complex peptide mixture was never before achieved. Moreover, modification with NMA is rather quantitative (only 15% -COOH arginine containing peptides were not modified, see also Figure 4, black bars), and therefore particularly suitable quantitative modification of peptides in the invention. Further optimisation of reaction conditions may yet improve the quantitative nature of the reaction between the guanidino moiety and NMA. The data of the above experiment is further illustrated in Figure 4, showing the numbers of C- terminal peptides not containing arginine ("C-term"), -COOH arginine containing peptides modified with NMA ("NMA"), and -COOH arginine containing peptides not modified with NMA ("R") in fractions 1-10 (which represent the respective secondary fractions 1 -10 pooled from the two primary fractions 36-40 and 40-44). As appears from Figure 4, NMA modified peptides are primarily found in the later secondary fractions (8, 9, 10), indicating that NMA produces a significant shift in peptide retention time, which can be exploited in the methods of the invention to isolate peptides not altered by NMA. Further optimisation of chromatographic conditions, e.g., narrowing the collection windows and/or increasing the NMA-induced shift by modifications to the chromatography buffer systems, may yet improve the separation of NMA- modified and non-modified peptides.
Table 6 lists the 100 unique peptide sequences identified in the above experiment and also indicates: the unique UniProtKB/Swiss-Prot database (http://www.expasy.org/uniprot/) accession number of the protein from which the peptides originated, whether or not they correspond to the C-terminus of a protein, and how many of the actually isolated peptides corresponding to said unique peptide sequences were modified with NMA.
Table 6.
Protein ace # I Sequence I C-term I NMA-modified /Total isolated I
Figure imgf000101_0001
Figure imgf000102_0001
The following conditions were used for the above RP-HPLC separation of peptides. The HPLC column was an analytical RP-HPLC column: 2.1 mm internal diameter (I. D.) x 150 mm (length) 300SB-C18 column, Zorbax® (Agilent, Waldbronn, Germany). Binary solvent gradient of HPLC solvent A (10 mM ammonium acetate (pH 5.5) in water/acetonitrile, 98/2 (v/v)) and HPLC solvent B (10 mM ammonium acetate (pH 5.5) in water/acetonitrile, 30/70 (v/v)) was applied for separating the peptide mixtures: following injection of a sample onto the column, 10 min isocratic run with 100% of solvent A at a constant flow rate of 80 μl/min was applied; a linear, binary gradient over 100 min to 100% of solvent B was applied;
- a 10 min isocratic wash with 100% of solvent B, followed by a linear gradient over 5 min to 0% of solvent B, was applied;
- the column was re-equilibrated for another 20 min with 100% of solvent A before injection of another sample.
Depending upon the type of peptides isolated and thus the preceding protein preparation steps, it was observed that peptides typically eluted between 20 and 100 min of gradient time, corresponding to acetonitrile concentrations of 7% and 63% respectively. 4 min wide primary fractions were collected in this range.
Example 5: A biomarker discovery platform
A biomarker discovery platform may employ a "reference design mode" in which query samples (e.g., from diseased individuals) and control samples (e.g., from healthy individuals) are quantitated relative to a same reference sample pool. Query and control samples are thus compared indirectly. For example, about 10 query samples may be quantified vs. the pool and about 10 control samples may be quantified vs. the same pool. The example below describes the comparison of one sample vs. a reference pool, using an exemplary layout of peptide sorting, separation and identification based on SCX isolation of N-terminal peptides.
The building blocks of a biomarker discovery pipeline applied to to-be-compared protein samples are as summarised below: Depletion of abundant peptides → Reduction/alkylation of Cys — > modification of N-terminal α-NH2 groups (acetylation and reverse acetylation) — > tryptic digestion → labelling → controlled mixing → SCX: collection of flow through comprising N-terminal and C-terminal peptides → separation of flow through using orthogonal 2D set-up → MALDI-MS analysis → Quantification/feature selection (expression matric/data mining) → MALDI-MS/MS => data interpretation and identification of differentially present proteins. Preparation of samples
100 μl of a crude serum sample and 100 μl of a reference serum pool (pool of 7 males - 1/1/1/1/1/1/1 v:v) were diluted 1 :5 in buffer A, part of the MARS depletion system (Agilent), filtered through a spin filter and each depleted in 4 runs on a human MARS column. The 2x four flow-throughs (4*1 ml) were subsequently combined resulting in one serum sample fraction and one reference serum pool fraction. Albumin and IgG depletion efficiency was tested via Western Blotting. A rough estimate of the protein concentration of the flow-throughs was determined by measuring the absorbance at 280 nm (Evolution 3000, Thermo Electron, Waltham, MA, USA). The protein samples were subsequently concentrated 2x using a Vivaspin filter with a MWCO of 3000 Da to yield a protein concentration of 1.24 mg/2ml for the sample and 1.18mg/2ml for the reference pool (concentration determined using BCA).
The proteins were subsequently denatured by adding guanidinium hydrochloride (final concentration 3M), reduced and alkylated using TCEP and iodoacetamide added in a 25 and 50 molar excess, respectively. Reduction took place at 300C during 10 min; alkylation at 600C during 1 h.
The mixtures were subsequently acetylated at 30°C during 90 min by adding sulfo-N- hydroxysuccinimide-acetate in a 75 molar excess. A deacetylation, for 20 min at room temperature, with ammonium hydroxide in a 3.5 molar excess, compared to sulfo-NHS- acetate, was performed to de-acetylate the serines and threonines that might get acetylated during the acetylation step. Following the reverse acetylation step, the samples were desalted on a PD10 column and captured in a 10 mM NH4HCO3 buffer at pH 8. Protein concentrations were measured as 870 μg (sample - 72% recovery) and 910 μg (reference pool - 73% recovery).
The samples, present in 3.5 ml following PD-10 desalting and buffer exchange were subsequently dried to 2 ml and digested with trypsin in a substrate:trypsin ratio of 50:1 (w:w) by overnight incubation at 37°C. The samples were acidified to pH 6 (by adding 10% FA) the following day and completely dried. 300 μl of H2 16O was added to the reference pool and 300 μl H2 18O to the serum sample and labelling took place during 4Oh.
125 μg 16O (41.21 μl) and 125 μg 18O (43.10 μl) labelled samples were subsequently combined in a controlled manner to prevent back-exchange. As can be derived from these amounts, 2 times 25 μl sample (sample and reference pool) is sufficient for one experiment thereby reducing the number of depletion runs with a factor 4. The mixing of both samples at pH 6 appeared to induce an immediate back-exchange. Therefore, the 16O and 18O labelled samples were acidified (to pH 3) prior to mixing. In this way back-exchange could be prevented. TFA was advantageously used to acidify the sample, since FA tends to interfere with the successful operation of the SCX column.
After acidification ACN was added to a final concentration of 50%. The latter was used to prevent non-specific interaction with the SCX column. The final volume was 550 μl (41.21 μl 16O, 43.10 μl 18O, 275 μl ACN, 18 μl 1 % TFA, 172.7 μl water) allowing the injection of 500 μl onto the SCX column. Injection of relative large volumes onto the SCX column allows the dilution of salts present in the sample; in the case presented salts originated from the NH4HCO3 buffer. If sample salt concentrations are too high, the binding of internal peptides onto the SCX column might be prevented. The final salt concentration in the sample was ~ 20 mM which is not expected to interfere with the successful operation of the SCX column. Higher loop volumes (up to 2 ml) were tested to further reduce salt concentration but the flow- through volume became too high, resulting in inefficient sample handling.
SCX operation
The column used was a Zorbax 300 Angstrom SCX column (2.1 mm ID, 5 cm L, 3.5 μm particle diameter) (Agilent). Stationary phase consists of silica particles with negatively charged residues (sulfonic acid) attached. This residue is charged over a wide pH range. The use of a 15 cm column was also considered, however, flow-through volume was higher and equilibration times 3x longer. The 5 cm column has sufficient capacity to handle 250 μg (125 μg 16O and 125 μg 18O).
The SCX procedure consists of several steps:
(1 ) Sample loading using 10 mM sodium-phosphate (pH 3), 50 % ACN. Flow-through is collected during this step.
(2) Elution of the internal peptides in order to prepare the column for a next round of sorting. Two mobile phases are thereby used: 10 mM sodium-phosphate (pH 3), 20% ACN and 10 mM sodium-phosphate (pH 3), 20% ACN, 2M NaCI. The former is used to switch from 50% ACN containing buffer to 20% ACN containing buffer. The latter is used to elute the internal peptides. A NaCI gradient is thereby applied. By switching to 20% ACN containing buffers, salt precipitation is prevented. (3) Second, short cleaning by injection of a KCI (2M) plug onto the column.
(4) Extensive equilibration of the column using 10 mM sodium-phosphate (pH 3), 50 % ACN. This buffer is several times injected to equilibrate the injection system.
2D-orthogonal chromatography operation The flow-through (1.3 ml) is collected in one glass vial (1.5 ml capacity with V-shape) and is completely dried. After re-dissolving the sample, it was injected onto a 2D orthogonal HPLC system to resolve the complex mixture of N-terminal peptides, C-terminal peptides and non- tryptic peptides. The 2D set-up consisted of a narrow-bore X-Terra Phenyl HPLC column (15 cm x 2.1 mm ID x 3.5 μm dp) and a nano C18 column (15 cm x 75 μm x 3 μm dp). The orthogonality of a phenyl and C18 column is limited when they are both operated at low pH. Therefore, the first dimension column was operated at high pH (pH 10). The X-Terra portfolio of columns is specifically designed for operation at higher pH. The nano-LC column was operated at pH 2. The SCX flow-through was dissolved in 500 μl mobile phase A consisting of 1OmM NH4OAc (pH 10). The entire sample was subsequently injected onto the X-Terra Phenyl LC column. Large volume injection onto the column appeared to be feasible, which is of major importance to limit the sample loss during sample handling. A 60 min ACN gradient was applied to separate the mixture (mobile phase B: 80% ACN, 1OmM NH4OAc (pH 10). 32 one minute fractions (100 μl volumes) were collected along the gradient. These fractions were dried and re-dissolved in 44 μl 0.1 % FA; mobile phase A in the second dimension separation. 20 μl of all fractions were injected onto the nano-RPLC column and peptides were separated using a 60 min ACN gradient (mobile phase B: 80% ACN, 0.1 % FA). During the separations, 260 spots were deposited on MALDI targets (15 sec spotting intervals). Prior to spotting peptides eluting from the column were mixed with a matrix solution (4 mg/mL α-cyano- hydroxy-cinnamic acid in 70% ACN, 0.1 % TFA) at a microtee. More than 8000 spots were generated using this set-up. All fractions were subsequently analyzed by MALDI-MS and MS/MS.
MS and MS/MS measurements were performed on a 4800 MALDI-TOF/TOF machine in the positive reflectron mode using default calibration. The scan range for the MS spectra stretched from 500-4000. A list of the top 20 signals, per MS spectrum was generated and MS/MS experiments were performed under "metastable precursor on" conditions, without the use of CID (collision induced dissociation) and at 1 keV. The precursor mass window was set at a resolution of 250 FWHM (full width half maximum). Unfiltered MASCOT generic files (mgf) were subsequently searched against both standard and ragged human Sprot databases using MASCOT as search engine. The latter database was used to detect N-terminally ragged peptides which are abundantly present in serum. Only peptides ranking #1 with scores above the 95% probability threshold were withheld. Spectra that had multiple peptide hits above the probability threshold were regarded as unidentified. Random hits were determined by searching the data against randomized databases. Proteins were reported if they had at least 1 peptide that unequivocally defines it.
The results can be consulted in Figures 6 and 7 and Table 7. Table 7. Data analysis at both the MALDI-MS and -MS/MS level. Very strict criteria were used
Figure imgf000107_0001
Other orthogonal separations
Other orthogonal separations were considered and tested; C18 and phenyl X-Terra columns operated at pH values ranging between 6.8 and 10 (tested), size-exclusion columns, IEC, CE, CEC and CIEF, FFE, ZIC-HILIC column operated at pH 6.8 (tested). HILIC can be considered as the reverse of reversed-phase LC. This means that mobile-phase A contains high concentrations of ACN, and mobile phase B high concentrations of water. Water is actually the strongest eluent in HILIC. This separation mode appeared to be highly orthogonal with RPLC at low pH as demonstrated in Figure 8. RPLC separations were performed as described above. The ZIC-HILIC column (15 cm L x 2.1 mm ID x 3.5 μm dp) was operated at a flow rate of 100 μl/min. Mobile phase A consisted of 85% ACN, 20 mM NH4OAc (pH 6.8) while mobile phase B consisted of 40% ACN, 20 mM NH4OAc (pH 6.8). A linear gradient between 0 and 100% B was applied in 60 min.
One drawback of HILIC separations is the fact that the sample needs to be dissolved in high ACN containing mobile phases (between 80 and 90% ACN), especially when combined with large volume injections (to obtain sufficient focusing onto the column). A number of peptides tend to precipitate at high ACN concentrations. We have developed an elegant strategy to allow injections from 100% H2O containing solutions. Peptides are first injected and focused onto a short (1.25 cm x 2.1 mm) Zorbax Extend C18 column and after a sufficient loading time, the peptides are eluted onto the HILIC column by placing the C18 column in-line with the HILIC column. Since mobile phase A of the HILIC column contains 85% ACN, 20 mM NH4OAc (pH 6.8), All peptides are desorbed from the C18 column (ACN strong eluens in RPLC) and focused onto the HILIC column (ACN weak eluens in HILIC). Peptides are subsequently separated by applying a water gradient (mobile-phase B: 40% ACN, 20 mM NH4OAc (pH 6.8). C18 and Phenyl X-Terra columns operated at high pH (pH 10) yield similar results but we believe that peptide recovery from the latter column may be better.

Claims

1. A method for protein identification and optionally quantification from a protein mixture comprising the steps:
(a) fragmenting a mixture of proteins to obtain a protein peptide mixture; (b) isolating from the protein peptide mixture:
(ba) peptides comprising the N-terminal ends of proteins of the mixture of proteins (i.e., N-terminal peptides), and/or
(bb) peptides comprising the C-terminal ends of proteins of the mixture of proteins (i.e., C-terminal peptides); (c) separating the isolated N-terminal and/or C-terminal peptides into fractions of peptides via a multidimensional separation process or via one-dimensional long-column chromatography; and
(d) identifying and optionally quantifying one or more N-terminal and/or C-terminal peptides from one or more of said fractions, whereby said identified N-terminal and/or C-terminal peptides represent one or more proteins from the mixture of proteins.
2. The method according to claim 1 , wherein C-terminal peptides are isolated or enriched from the protein mixture using steps comprising: proteolysing the mixture of proteins by trypsin or trypsin-like protease to obtain the protein peptide mixture; - isolating, under conditions where substantially all Arg and Lys side chains are protonated, a subset of peptides from the protein peptide mixture, wherein the peptides of said subset have about zero net charge under said conditions.
3. The method according to claim 1 , wherein N-terminal and C-terminal peptides are isolated or enriched from the protein mixture using steps comprising: - blocking α-NH2 groups of proteins in the protein mixture to prevent their protonation under acidic conditions; proteolysing the protein mixture by trypsin or trypsin-like protease to obtain the protein peptide mixture; isolating, under conditions where substantially all Arg and Lys (if not blocked) side chains are protonated, a subset of peptides from the protein peptide mixture, wherein the peptides of said subset have about zero net charge under said conditions.
4. The method according to claim 2 or 3, wherein said conditions encompass acidic conditions, preferably pH about 4.0 or less, more preferably pH about 3.0 or less.
5. The method according to any of claim 2 to 4, wherein the subset of peptides displaying about zero net charge is isolated using ion exchange chromatography (IEC), preferably cation exchange chromatography (CEC), more preferably strong cation exchange (SCX) chromatography.
6. The method according to any of claims 1 to 5, wherein the dimensions of the multidimensional process resolve peptides on the basis on different physical and/or chemical properties chosen from the group comprising net charge, electrophoretic mobility (EPM), isoelectric point (pi), molecular size and/or ability or tendency to form certain type(s) of molecular interactions, such as dispersive (hydrophobic) interactions, dipole-dipole polar interactions, dipole-induced dipole polar interactions or ionic interactions.
7. The method according to any of claims 1 to 6, wherein the multidimensional separation process is multidimensional chromatography, preferably 4D-, more preferably 3D-, even more preferably 2D-chromatography.
8. The method according to claim 7, wherein in one dimension the chromatography uses RP- HPLC at high pH and in another dimension the chromatography uses RP-HPLC at low pH.
9. The method according to any of claims 7 or 8, wherein in one dimension the chromatography uses C18 RPLC and in another dimension chromatography uses stationary phase functionalised with phenyl moiety.
10. The method according to claim 7, wherein one dimension the chromatography uses C18 RPLC and in another dimension the chromatography uses hydrophilic interaction chromatography (HILIC), preferably ZIC-HILIC.
11. The method according to any of claims 1 to 5, wherein the long-column chromatography uses a stationary phase column having length of at least 1 metre.
12. A method for identification of proteins differentially present between a first protein mixture and a second protein mixture comprising the steps: (a) fragmenting the first protein mixture to obtain a first protein peptide mixture and fragmenting the second protein mixture to obtain a second protein peptide mixture;
(b) labelling the first protein peptide mixture with a first isotope and labelling the second protein peptide mixture with a second isotope different from the first isotope; (c) combining the protein peptide mixtures and from (b) and isolating from said combined peptide mixtures:
(ca) N-terminal peptides of proteins of the first and second protein mixtures, and/or
(cb) C-terminal peptides of proteins of the first and second protein mixtures;
(d) separating the isolated N-terminal and/or C-terminal peptides into fractions of peptides via a multidimensional separation process or via one-dimensional long-column chromatography; and
(e) identifying one or more N-terminal and/or C-terminal peptides which are differentially present in the first protein peptide mixture as compared to the second protein peptide mixture, whereby said identified N-terminal and/or C-terminal peptides represent one or more proteins differentially present between the first and second protein mixtures.
13. Use of the method according to any of claims 1 to 12 for identification of proteins differentially present between different samples, preferably for identification of biomarkers.
14. A peptide sorting device or system configured to perform the method according to any of claims 1 to 12, preferably wherein two or more peptide sorting or separation steps are performed in-line.
15. A method for isolating or enriching, from a protein peptide mixture obtained from a protein or a mixture of proteins, the peptides that comprise the C-terminal ends of said protein or mixture of proteins, comprising the steps of:
(a) fragmenting the protein or the mixture of proteins preferentially at peptide bonds C- terminally adjacent to one or more amino acid residue types (X1, X2,... Xπ) to obtain the protein peptide mixture; and
(b) isolating a subset of peptides from said protein peptide mixture, comprising the steps of: reacting the protein peptide mixture with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xn, and isolating from the so reacted protein peptide mixture the subset of peptides unaltered by said reacting.
16. The method according to claim 15, wherein said isolating the subset of unaltered peptides is by chromatography.
17. The method according to any of claims 15 or 16, wherein said step (b) for isolating a subset of peptides from said protein peptide mixture, comprises the steps of: (ba) separating the protein peptide mixture into fractions of peptides via chromatography,
(bb) reacting at least one and preferably each peptide fraction from step (ba) with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xπ, thereby obtaining altered and unaltered peptides for each so reacted fraction, and
(be) isolating the subset of unaltered peptides out of each so reacted fraction via chromatography, wherein the chromatography of steps (ba) and (be) is performed with the same type of chromatography.
18. The method according to any of claims 15 to 17, wherein the side chains of said one or more amino acid residue types X1, X2,... Xπ comprise reactive moieties.
19. The method according to claim 18, wherein said reactive moieties are chosen from: mercapto, alkylthio, hydroxyphenyl, primary amino, secondary amino, guanidino, ureyl and carboxyl; more preferably from: mercapto, methylthio, p-hydroxyphenyl, primary amino, indyl, pyrrolidinyl, imidazyl, guanidino or carboxyl.
20. The method according to any of claims 15 to 19, wherein the one or more amino acid residue types X1, X2,... Xπ are chosen from the group consisting of Met, Cys, Tyr, Trp, Pro, His, Lys, Arg, hArg, GIu and Asp.
21. The method according to any of claims 15 to 20, wherein: when said one or more amino acid residue types X1, X2,... Xπ comprise a side chain primary amino group, said residues are modified in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from 1-fluoro-2,4-dinitrobenzene, trinitrobenzene sulphonic acid, ethylthiotrifluoroacetate, succinyl anhydride or, preferably, O-methylisourea; and/or
- when said one or more amino acid residue types X1, X2,... Xπ comprise a side chain guanidino group, said residues are modified in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from a dicarbonyl compound or derivative thereof, a peptidylarginine deiminase or an arginase; and/or
- when said one or more amino acid residue types X1, X2,... Xπ comprise a side chain mercapto group, said residues are modified in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from iodoacetate, 1-fluoro-2,4- dinitrobenzene, N-ethylmaleimide, p-hydroxymercuribenzoate, 5,5'-dithiobis(2- nitrobenzoic acid), or performic acid; and/or when said one or more amino acid residue types X1, X2,... Xπ comprise a side chain alkylthio group, said residues are modified in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from cyanogen bromide, iodoacetate or performic acid (reacting to methionine sulphone); and/or
- when said one or more amino acid residue types X1, X2,... Xπ comprise a side chain carboxyl group, said residues are modified in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from diazomethane or glycine methyl ester; and/or when said one or more amino acid residue types X1, X2,... Xπ comprise a side chain indyl group, said residues are modified in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from iodoacetate or diethylpyrocarbonate; and/or
- when said one or more amino acid residue types X1, X2,... Xπ comprise a side chain imidazyl group, said residues are modified in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from 2,4-dinitrophenylsulphenyl chloride or N-bromosuccinimide; and/or when said one or more amino acid residue types X1, X2,... Xπ comprise a side chain hydroxyphenyl group, said residues are modified in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from tetranitromethane.
22. The method according to any of claims 15 to 20, wherein: when said one or more amino acid residue types X1, X2,... Xπ is Arg, Lys or hArg, preferably Arg or Lys, said residues are removed in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from carboxypeptidase B, carboxypeptidase U or carboxypeptidase D; and/or when said one or more amino acid residue types X1, X2,... Xπ is a basic amino acid, preferably Arg or Lys, more preferably Lys, said residues are removed in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from carboxypeptidase N; and/or
- when said one or more amino acid residue types X1, X2,... Xπ is GIu, said residues are removed in step (b) by reacting the protein peptide mixture or fraction thereof with an agent chosen from carboxypeptidase G.
23. The method according to any of claims 15 to 20, wherein the protein or the mixture of proteins is fragmented in step (a) preferentially at peptide bonds C-terminally adjacent to amino acid residue(s) comprising a guanidino moiety, preferably C-terminally adjacent to arginine and/or homoarginine, and/or C-terminally adjacent to lysine where the lysine is converted to homoarginine subsequent to the fragmentation.
24. The method according to claim 23, wherein peptides of the protein peptide mixture comprising a guanidino moiety are modified by reacting with an agent chosen from a dicarbonyl compound or derivative thereof, peptidylarginine deiminase or arginase.
25. The method according to claim 25, wherein said dicarbonyl compound or derivative thereof is chosen from arylglyoxal, preferably phenylglyoxal or hydroxyphenylglyoxal, or nitromalondialdehyde (NMA).
26. The method according to any of claims 15 to 20, wherein the protein or the mixture of proteins is fragmented in step (a) preferentially at peptide bonds C-terminally adjacent to basic amino acid residue(s), preferably C-terminally adjacent to Arg and/or hArg and/or Lys.
27. The method according to claim 26, wherein the basic amino acid residue is removed from peptides of the protein peptide mixture comprising such residue as their last residue, by reacting with an agent chosen from carboxypeptidase B, carboxypeptidase U, carboxypeptidase D or carboxypeptidase N, preferably carboxypeptidase B.
28. The method according to any of claims 15 to 27, wherein said fragmenting of the protein or mixture of proteins in step (a) is effected by an endoproteinase, preferably by trypsin type endoproteinase or trypsin-like endoproteinase, more preferably by trypsin.
29. The method of any of claims 15 to 28, comprising the steps of: (i) fragmenting the protein or the mixture of proteins preferentially at peptide bonds C- terminally adjacent to one or more amino acid residue types (X1, X2,... Xπ) to obtain the protein peptide mixture, wherein said one or more amino acid residue types X1, X2,... Xπ are basic; (ii) isolating, under conditions where the majority of the C-terminal -C(=O)OH groups of the peptides of the protein peptide mixture are dissociated, i.e., -C(=O)O", the majority of the N-terminal -NH2 groups of said peptides are protonated, i.e., -NH3 +, and the basic side chain moiety of the majority of basic amino acids adjacent to which the protein or the mixture of proteins were proteolysed are protonated, a subset of peptides from the protein peptide mixture, wherein the peptides of said subset have about zero net charge under said conditions; and
(iii) isolating a subset of peptides from said subset of (ii), comprising the steps of: reacting the subset of (ii) with an agent capable of specifically modifying or removing said one or more amino acid residue types X1, X2,... Xπ, and isolating from the so reacted protein peptide mixture the subset of peptides unaltered by said reacting.
30. The method according to claim 29, wherein in step (i) the protein or the mixture of proteins is fragmented preferentially at peptide bonds C-terminally adjacent to arginine and/or lysine and/or homoarginine residues.
31. The method according to any of claims 29 or 30, wherein in step (ii) the pH is between 2.5 and 4.0, preferably between 2.5 and 3.5, even more preferably between 2.75 and 3.25 and most preferably about 3.
32. The method according to any of claims 29 to 31 , wherein in step (ii) said isolating is by cation exchange chromatography, preferably by strong cation exchange ("SCX") chromatography, or isoelectric focusing of peptides, or zwitterionic ion exchange chromatography.
33. The method according to any of claims 15 to 33, further comprising the step of identifying peptides of said isolated subset of peptides, and preferably identifying proteins corresponding to said identified peptides.
34. The method according to claim 33, wherein the step of identifying the peptides is by mass spectrometry.
35. A method to determine the relative amount of one or more proteins in two or more samples comprising proteins, the method comprising the steps of (a) labelling the peptides present in a first sample with a first isotope; (b) labelling the peptides present in a second sample with a second isotope; (c) combining the protein peptide mixture of the first sample with the protein peptide mixture of the second sample; (d) isolating a subset of peptides as defined in any of claims 15 to 32; (e) performing mass spectrometric analysis of the isolated peptides; (f) calculating the relative amounts of the isolated peptides in each sample by comparing the peak heights of the identical but differential isotopically labelled isolated peptides; and (g) determining the identity of the isolated peptides and their corresponding proteins.
36. A method to quantify the amount of one or more proteins in a sample comprising proteins, comprising the steps of: (a) preparing a protein peptide mixture, preferably as defined in any of claims 15 to 34; (b) adding to the mixture a known amount of a synthetic reference peptide labelled with an isotope distinguishable form the reference peptide isotope; (c) isolating a subset (S) of peptides as defined in any of claims 15 to 32; (d) performing mass spectrometric analysis of the isolated peptides; and (e) determining the amount of the protein present in the sample by comparing the peak heights of the synthetic reference peptide to the reference peptide.
37. The use of the method according to any of claims 15 to 36 for the study of protein processing, preferably protein processing in vivo or in cell culture.
38. The use of the method according to any of claims 15 to 36 for the identification and/or quantification of proteins from one species on the background of proteins from another species.
PCT/EP2008/055802 2007-05-10 2008-05-13 Isolation of peptides and proteomics platform Ceased WO2008138916A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
EP07107907.3 2007-05-10
EP07107908 2007-05-10
EP07107908.1 2007-05-10
EP07107907 2007-05-10
EP07114983.5 2007-08-24
EP07114983 2007-08-24

Publications (1)

Publication Number Publication Date
WO2008138916A1 true WO2008138916A1 (en) 2008-11-20

Family

ID=39790133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/055802 Ceased WO2008138916A1 (en) 2007-05-10 2008-05-13 Isolation of peptides and proteomics platform

Country Status (1)

Country Link
WO (1) WO2008138916A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010086386A1 (en) * 2009-01-30 2010-08-05 Pronota N.V. Protein quantification methods and use thereof for candidate biomarker validation
CN103175933A (en) * 2013-02-20 2013-06-26 广州市质量监督检测研究院 Method for quantitatively detecting alkylphenol polyoxyethylene in daily chemical products
WO2015118152A1 (en) * 2014-02-07 2015-08-13 European Molecular Biology Laboratory Proteomic sample preparation using paramagnetic beads
EP2855425A4 (en) * 2012-06-04 2016-05-18 Scripps Research Inst NEW PHENYL GLYOXAL PROBES
CN107056891A (en) * 2017-04-19 2017-08-18 广东南芯医疗科技有限公司 One group of polypeptide and its application for preparation system lupus erythematosus diagnosis product
CN107384998A (en) * 2016-05-16 2017-11-24 中国科学院大连化学物理研究所 A kind of protein C based on carboxypeptidase and strong cation exchange chromatography-end enrichment method
US9890197B2 (en) 2010-05-31 2018-02-13 London Health Sciences Centre Research Inc. RHAMM binding peptides
WO2018073409A1 (en) * 2016-10-21 2018-04-26 Westfälische Wilhelms-Universität Münster Specific ac5 inhibitor
CN109142611A (en) * 2017-06-15 2019-01-04 中国科学院大连化学物理研究所 A kind of enrichment method of the SUMOization peptide fragment based on hydrophobic grouping modification
US10421874B2 (en) 2016-06-30 2019-09-24 Ppg Industries Ohio, Inc. Electrodepositable coating composition having improved crater control
CN110938131A (en) * 2019-11-08 2020-03-31 上海交通大学 A kind of biologically active polypeptide RDLDAPDDVDFF and its preparation method and application
CN119574732A (en) * 2024-11-28 2025-03-07 浙江省杭州生态环境监测中心 A method for fully automatic online rapid detection of perfluorinated compounds in water

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002077016A2 (en) * 2001-03-22 2002-10-03 Vlaams Interuniversitair Instituut Voor Biotechnologie Vzw Methods and apparatus for gel-free qualitative and quantitative proteome analysis, and uses therefore

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002077016A2 (en) * 2001-03-22 2002-10-03 Vlaams Interuniversitair Instituut Voor Biotechnologie Vzw Methods and apparatus for gel-free qualitative and quantitative proteome analysis, and uses therefore
US20040005633A1 (en) * 2001-03-22 2004-01-08 Joel Vandekerckhove Methods and apparatuses for gel-free qualitative and quantitative proteome analysis, and uses therefore

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
AIVALIOTIS MICHALIS ET AL: "Large-scale identification of N-terminal peptides in the halophilic archaea Halobacterium salinarum and Natronomonas pharaonis.", JOURNAL OF PROTEOME RESEARCH JUN 2007, vol. 6, no. 6, 20 April 2007 (2007-04-20), pages 2195 - 2204, XP002498619, ISSN: 1535-3893 *
GEVAERT K ET AL: "Protein identification methods in proteomics.", ELECTROPHORESIS APR 2000, vol. 21, no. 6, April 2000 (2000-04-01), pages 1145 - 1154, XP002498623, ISSN: 0173-0835 *
GEVAERT KRIS ET AL: "Exploring proteomes and analyzing protein processing by mass spectrometric identification of sorted N-terminal peptides.", NATURE BIOTECHNOLOGY MAY 2003, vol. 21, no. 5, May 2003 (2003-05-01), pages 566 - 569, XP002498621, ISSN: 1087-0156 *
GEVAERT KRIS ET AL: "Protein processing and other modifications analyzed by diagonal peptide chromatography.", BIOCHIMICA ET BIOPHYSICA ACTA DEC 2006, vol. 1764, no. 12, December 2006 (2006-12-01), pages 1801 - 1810, XP002498620, ISSN: 0006-3002 *
GHESQUIÈRE BART ET AL: "A new approach for mapping sialylated N-glycosites in serum proteomes.", JOURNAL OF PROTEOME RESEARCH NOV 2007, vol. 6, no. 11, 10 May 2007 (2007-05-10), pages 4304 - 4312, XP002498622, ISSN: 1535-3893 *
GILANY K ET AL: "The proteome of the human neuroblastoma cell line SH-SY5Y: An enlarged proteome", BIOCHIMICA ET BIOPHYSICA ACTA (BBA) - PROTEINS & PROTEOMICS, ELSEVIER, vol. 1784, no. 7-8, 1 July 2008 (2008-07-01), pages 983 - 985, XP022732416, ISSN: 1570-9639, [retrieved on 20080319] *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010086386A1 (en) * 2009-01-30 2010-08-05 Pronota N.V. Protein quantification methods and use thereof for candidate biomarker validation
US9890197B2 (en) 2010-05-31 2018-02-13 London Health Sciences Centre Research Inc. RHAMM binding peptides
EP2855425A4 (en) * 2012-06-04 2016-05-18 Scripps Research Inst NEW PHENYL GLYOXAL PROBES
US9347948B2 (en) 2012-06-04 2016-05-24 The Scripps Research Institute Phenyl glyoxal probes
US9921225B2 (en) 2012-06-04 2018-03-20 The Scripps Research Institute Phenyl glyoxal probes
CN103175933B (en) * 2013-02-20 2014-11-05 广州市质量监督检测研究院 Method for quantitatively detecting alkylphenol polyoxyethylene in daily chemical products
CN103175933A (en) * 2013-02-20 2013-06-26 广州市质量监督检测研究院 Method for quantitatively detecting alkylphenol polyoxyethylene in daily chemical products
WO2015118152A1 (en) * 2014-02-07 2015-08-13 European Molecular Biology Laboratory Proteomic sample preparation using paramagnetic beads
EP3102612B1 (en) * 2014-02-07 2024-10-09 European Molecular Biology Laboratory Proteomic sample preparation using paramagnetic beads
CN107384998A (en) * 2016-05-16 2017-11-24 中国科学院大连化学物理研究所 A kind of protein C based on carboxypeptidase and strong cation exchange chromatography-end enrichment method
US10421874B2 (en) 2016-06-30 2019-09-24 Ppg Industries Ohio, Inc. Electrodepositable coating composition having improved crater control
WO2018073409A1 (en) * 2016-10-21 2018-04-26 Westfälische Wilhelms-Universität Münster Specific ac5 inhibitor
CN107056891A (en) * 2017-04-19 2017-08-18 广东南芯医疗科技有限公司 One group of polypeptide and its application for preparation system lupus erythematosus diagnosis product
CN109142611A (en) * 2017-06-15 2019-01-04 中国科学院大连化学物理研究所 A kind of enrichment method of the SUMOization peptide fragment based on hydrophobic grouping modification
CN110938131A (en) * 2019-11-08 2020-03-31 上海交通大学 A kind of biologically active polypeptide RDLDAPDDVDFF and its preparation method and application
CN110938131B (en) * 2019-11-08 2021-07-09 上海交通大学 A kind of biologically active polypeptide RDLDAPDDVDFF and its preparation method and application
CN119574732A (en) * 2024-11-28 2025-03-07 浙江省杭州生态环境监测中心 A method for fully automatic online rapid detection of perfluorinated compounds in water

Similar Documents

Publication Publication Date Title
WO2008138916A1 (en) Isolation of peptides and proteomics platform
Gilmore et al. Advances in shotgun proteomics and the analysis of membrane proteomes
Damoc et al. Structural characterization of the human eukaryotic initiation factor 3 protein complex by mass spectrometry
Maini et al. Incorporation of β-amino acids into dihydrofolate reductase by ribosomes having modifications in the peptidyltransferase center
US20060263886A1 (en) Fluorous labeling for selective processing of biologically-derived samples
Bai et al. Analysis of endogenous D-amino acid-containing peptides in metazoa
Shen et al. Isolation and isotope labeling of cysteine-and methionine-containing tryptic peptides: application to the study of cell surface proteolysis
Aprilita et al. Poly (glycidyl methacrylate/divinylbenzene)-IDA-FeIII in phosphoproteomics
KR20210116502A (en) Methods and systems for identifying and quantifying antibody fragmentation
Yates [15] Protein structure analysis by mass spectrometry
WO2009003952A2 (en) Column and method for preparing a biological sample for protein profiling
Badgujar et al. Enantiomeric purity of synthetic therapeutic peptides: A review
US20100311114A1 (en) Preparation of samples for proteome analysis
EP2488492B1 (en) Protected amine labels and use in detecting analytes
Zhu et al. Analysis of human serum phosphopeptidome by a focused database searching strategy
Chen et al. Depletion of internal peptides by site-selective blocking, phosphate labeling, and TiO2 adsorption for in-depth analysis of C-terminome
Bastos et al. EDTA-functionalized magnetic nanoparticles: A suitable platform for the analysis of low abundance urinary proteins
Drǎguşanu et al. Epitope motif of an anti‐nitrotyrosine antibody specific for tyrosine‐nitrated peptides revealed by a combination of affinity approaches and mass spectrometry
US20110045989A1 (en) Selective enrichment of n-terminally modified peptides from complex samples
JP2010148442A (en) Method for concentrating glycopeptide having sulfated sugar chain and kit therefor
WO2010086386A1 (en) Protein quantification methods and use thereof for candidate biomarker validation
US7943029B2 (en) Method, composition and kit for isoelectric focusing
Yu Developments of high-throughput quantitative top-down proteomics
Pocsfalvi Selective enrichment in phosphopeptides for the identification of phosphorylated mitochondrial proteins
Lai Advances in Proteomic Methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08759519

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08759519

Country of ref document: EP

Kind code of ref document: A1