You can subscribe to this list here.
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
(2) |
| 2015 |
Jan
|
Feb
|
Mar
(19) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
|
Feb
(7) |
Mar
|
Apr
(11) |
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
| 2023 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Sebastian <seb...@un...> - 2023-04-03 11:40:26
|
Hi, you must not use that data for evaluations, unless you know what you are doing. The data are now part of GNPS, MoNA and other repositories and spectral libraries. The data are also used as training data of numerous ML tools. In short: Don't. Best, Sebastian. Am 1. April 2023 23:44:09 GMT+09:00 schrieb "김현우" <hwk...@do...>: >To whom may it concern.Hi.My name is Hyunwoo Kim and I'm a professor in the college of pharmacy, Dongguk University in S.Korea.Recently, I'm working on a project about mass spectrometry, and checking the performance of prediction between many tools including MS-Finder, Sirius and so on.To achieve the results, I would like to use CASMI2016 data, but when I tried to download the challenge data (http://msbi.ipb-halle.de/download/CASMI2016/CASMI2016_Cat2and3_Challenge_Candidates.zip),the download link was dead.Could you let me know how I can get it?Looking forward to having your response.Best,Hyunwoo --------------------------------------------------Hyunwoo Kim, PhDAssistant ProfessorCollege of Pharmacy, Dongguk University32 Dongguk-ro, Ilsandong-gu, Goyang, Korea 10326 -- Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet. |
|
From: 김현우 <hwk...@do...> - 2023-04-01 15:06:06
|
To whom may it concern.Hi.My name is Hyunwoo Kim and I'm a professor in the college of pharmacy, Dongguk University in S.Korea.Recently, I'm working on a project about mass spectrometry, and checking the performance of prediction between many tools including MS-Finder, Sirius and so on.To achieve the results, I would like to use CASMI2016 data, but when I tried to download the challenge data (http://msbi.ipb-halle.de/download/CASMI2016/CASMI2016_Cat2and3_Challenge_Candidates.zip),the download link was dead.Could you let me know how I can get it?Looking forward to having your response.Best,Hyunwoo --------------------------------------------------Hyunwoo Kim, PhDAssistant ProfessorCollege of Pharmacy, Dongguk University32 Dongguk-ro, Ilsandong-gu, Goyang, Korea 10326 |
|
From: Neumann, S. <sne...@ip...> - 2018-09-14 06:14:35
|
Hi Samuel, currently, we don't have a schedule for a next CASMI, so don't expect a CASMI 2018 for now. Yours, Steffen On Mon, 2018-09-03 at 10:38 +0200, Samuel BERTRAND wrote: > Hi all, > > I was wondering if there is a CASMI 2018 (or 2019) coming up or no ? > > thanks in advance. > samuel > > > Le 03/09/2018 à 08:04, Neumann, Steffen a écrit : > > Hi, > > > > Iris has in the meantime been able to download the archive. > > We've had intermittent network discruption, > > and she must've hit one of the bad moments. > > > > Yours, > > Steffen > > > > > > On Tue, 2018-08-28 at 01:05 +0200, Iris Eckert wrote: > > > Dear CASMI team, > > > > > > I am currently working with the data from CASMI 2016 (http://www. > > > casm > > > i-contest.org/2016/challenges-cat2+3.shtml) and I want to > > > download > > > the Challenge_Candidates.zip file from the page. This link seems > > > to > > > be dead. Can you please send me another link to the CSV zip or > > > the > > > zip itself? > > > > > > Kind regards, > > > > > > > > > I. Eckert > > > > > > > > > --------------------------------------------------------------- > > > --------------- > > > Check out the vibrant tech community on one of the world's most > > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > > > > > > _______________________________________________ > > > Casmi-discuss mailing list > > > Cas...@li... > > > https://lists.sourceforge.net/lists/listinfo/casmi-discuss > > ------------------------------------------------------------------- > ----------- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Casmi-discuss mailing list > Cas...@li... > https://lists.sourceforge.net/lists/listinfo/casmi-discuss -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
|
From: Samuel B. <sam...@un...> - 2018-09-03 08:54:30
|
Hi all, I was wondering if there is a CASMI 2018 (or 2019) coming up or no ? thanks in advance. samuel _______________________________________ Samuel BERTRAND, Maitre de conférence (Associate Professor) Ph.D. Chimie, M.Sc. Chimie (ENSCL), Université de Nantes, -> UFR des Sciences Pharmaceutiques et Biologiques, Tel: +33(0)2 53 48 43 10 (interne: 33 43 10) -> Laboratoire Mer, Molécules, Santé (EA 2160), Tel: +33(0)2 51 12 56 89 (interne: 45 56 89) 9 rue bias BP 61112, 44035 Nantes cedex 1, France https://www.researchgate.net/profile/Samuel_Bertrand/ http://www.mms.univ-nantes.fr/23221706/0/fiche___pagelibre/&RH=MMS_FR1&RF=1352222888411 http://bertrandsamuel.free.fr/gp2a/ http://bertrandsamuel.free.fr/siderophore_base/ http://scholar.google.com/citations?user=Y4_9oyMAAAAJ Le 03/09/2018 à 08:04, Neumann, Steffen a écrit : > Hi, > > Iris has in the meantime been able to download the archive. > We've had intermittent network discruption, > and she must've hit one of the bad moments. > > Yours, > Steffen > > > On Tue, 2018-08-28 at 01:05 +0200, Iris Eckert wrote: >> Dear CASMI team, >> >> I am currently working with the data from CASMI 2016 (http://www.casm >> i-contest.org/2016/challenges-cat2+3.shtml) and I want to download >> the Challenge_Candidates.zip file from the page. This link seems to >> be dead. Can you please send me another link to the CSV zip or the >> zip itself? >> >> Kind regards, >> >> >> I. Eckert >> >> >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> >> _______________________________________________ >> Casmi-discuss mailing list >> Cas...@li... >> https://lists.sourceforge.net/lists/listinfo/casmi-discuss |
|
From: Neumann, S. <sne...@ip...> - 2018-09-03 06:16:33
|
Hi, Iris has in the meantime been able to download the archive. We've had intermittent network discruption, and she must've hit one of the bad moments. Yours, Steffen On Tue, 2018-08-28 at 01:05 +0200, Iris Eckert wrote: > Dear CASMI team, > > I am currently working with the data from CASMI 2016 (http://www.casm > i-contest.org/2016/challenges-cat2+3.shtml) and I want to download > the Challenge_Candidates.zip file from the page. This link seems to > be dead. Can you please send me another link to the CSV zip or the > zip itself? > > Kind regards, > > > I. Eckert -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
|
From: Iris E. <iri...@st...> - 2018-08-27 23:29:45
|
Dear CASMI team, I am currently working with the data from CASMI 2016 (http://www.casmi-contest.org/2016/challenges-cat2+3.shtml) and I want to download the Challenge_Candidates.zip file from the page. This link seems to be dead. Can you please send me another link to the CSV zip or the zip itself? Kind regards, I. Eckert |
|
From: Neumann, S. <sne...@ip...> - 2017-09-08 14:58:20
|
Dear all, Thanks everyone for your interest in this years CASMI contest. We are looking forward to the submissions to the contest deadline in one week, on the 15th of september. The submissions to the first deadline for the preliminary evaluations also included several interesting new approaches in the field. Due to a data duplication we had to update the MS2 spectrum information for Challenge 15. Due to the advanced time, we will not count Challenge 15 in the participant summary, but if possible it would be great if you could re-analyse the updated data. Please also double check that you used the precursor mass 577.1 for Challenge 29, where we issued an update of MS1 data files back in May, which was not reflected in the summary-001-045.csv file until now. The mailing list cas...@sf... for submissions has been filtering ZIP attachments, causing bounced submissions, and the SourceForge support team had no way of disabling that. So please note the modified submission instructions on http://casmi-contest.org/2017/rules-submit_evaluate.shtml , where we now ask you to send a link to the ZIP file of your submission, using a file hosting services like http://dropexpire.com/ , or sharing a link via Dropbox or your institutions Owncloud or similar infrastructure. Several participants have already done it that way. Please contact us if you need assistance here. In response to privacy regulations this summer SourceForge had asked their users, including mailing list subscribers, to renew their list subscriptions, asking for the express consent to receive email communication. You might want to ask your team colleagues if they still do receive CASMI mails, and to renew the subscription if not. Yours, The CASMI organisers |
|
From: Schymanski, E. <Emm...@ea...> - 2017-04-28 09:24:22
|
Dear all, The challenge data for CASMI* 2017 is now online. With 45 (Category 1) and up to 243 (Categories 2&3) natural products challenges - including a few tricky ones - there's something for everyone! There are some changes in 2017 - read the website carefully for more: http://www.casmi-contest.org/2017/ If you have any questions - contact us at cas...@li...<mailto:casmi%2Dteam2017%40l%69st%73.sf.ne%74> All are welcome to participate, please spread the word and help CASMI grow. Happy identifying, Dejan, Nir, Emma and Steffen. *CASMI = Critical Assessment of Small Molecule Identification °°° Dr. Emma Schymanski Eawag - Umweltchemie/Environmental Chemistry Überlandstrasse 133 8600 Dübendorf Schweiz/Switzerland Tel. +41 (0)58 765 55 37 Fax. +41 (0)58 765 58 26 emm...@ea...<mailto:emm...@ea...> http://www.eawag.ch/~schymaem |
|
From: Schymanski, E. <Emm...@ea...> - 2016-05-24 06:38:46
|
Dear Tobias, Thanks a lot for pointing this out - every conversion results in issues and this loss/gain of Hs has been a pain for years and is something that all our methods have to deal with. Depending on the converter behind, different structures will be processed differently internally too (the default CDK tautomer on some of our structures is not the one we expect but it is hard to "force" - this makes a huge difference in the log P prediction but also in the fragment prediction). The method was stated on the website as follows: The candidates were retrieved from ChemSpider as SMILES strings on 14/02/2016 and converted to standard InChI and InChIKeys with OpenBabel. The candidate lists are saved as CSV files, with ‘ “ ‘ as quoting character and “,“ as field separator. Candidates where the OpenBabel conversion from SMILES to InChI failed were removed. The presence of the correct solution in the candidates was checked. So of course we did check the candidate was present :-) but only removed those that failed to convert at all. I can add a little piece of background to why we were forced to convert and thus why the keys no longer match the identifier in some cases - the ChemSpider query used in MetFrag actually returns the non-standard InChI and InChIKey and Steffen was forced to convert the data to obtain standard ones. This is not ideal - much better would have been to obtain the original data from the database (because every conversion adds issues). You can get the standard keys from ChemSpider too but this requires a different query - Christoph can provide you the history there better than I. Unfortunately the time was too tight (don't get me started on deadlines...). In the future one could consider filtering these structures (better still fix the retrieval) - but I am not entirely sure it will help because we can catch some cases but for sure not all. The formula is easy enough but in some cases the formula is right and it's hidden in a charge at the back, ... etc... We have tried in our own past CASMI submissions - you can define some rules but some always slip through. As soon as some of these structures are loaded internally, they can change - we have seen this all too often in MetFrag. Yes it will affect the ranking but if evveryone started from the same candidate sets, we are all in the same boat. I'd still like to see structures from structure generation in a future competition :-) and providing SDFs would also have been better - but also much bigger. If we had provided both (SDF and CSV) then we would have most likely have provided inconsistent datasets within our competition (i.e. people start from a different starting point) and this is to be avoided too. Hence the decision to stick with CSV only. Do you think starting with SDF only would be better? Thanks, Emma PS: yes we could also have used PubChem and had Standard Keys from the start but not all the structures were in PubChem and we would have had even more candidate structures - well over double. ________________________________ From: Tobias Kind [tk...@uc...] Sent: Monday, 23 May 2016 9:16 PM To: cas...@li... Cc: cas...@li... Subject: Re: [Casmi-team2016] SDF conversion fail Hi, There are a number of cases where incorrect InchiKeys or Inchis were submitted. In a number of cases they are related to salts or betaines or other complicated structures. Example Challange-88 from the CASMI CSV file Identifier 3624293 CompoundName 6,7-Dihydroxy-2,4-dimethylchromenium MonoisotopicMass 191.070267 MolecularFormula C11H11O3 SMILES Cc1cc([o+]c2c1cc(c(c2)O)O)C InChI InChI=1S/C11H11O3/c1-6-3-7(2)14-11-5-10(13)9(12)4-8(6)11/h3-5,12-13H,1-2H3/q+1 InChIKey BATARTWICDUXRG-UHFFFAOYSA-N NOW open the compound in chemspider (INCHIKEY can not be found) http://www.chemspider.com/Search.aspx?q=BATARTWICDUXRG <http://www.chemspider.com/Search.aspx?q=BATARTWICDUXRG>Now open the Chemspider ID http://www.chemspider.com/Chemical-Structure.3624293.html Its the same name, but the Inchikey is different (under detail) BCVCWDNQJMGQIH<http://www.google.com/search?q=BCVCWDNQJMGQIH>-UHFFFAOYSA-O<http://www.google.com/search?q=BCVCWDNQJMGQIH-UHFFFAOYSA-O> So that happened to many compounds and also might have influenced the rankings when by random chance the compound was incorrectly converted. OpenBabel creates such InchKeys sometimes. ---- There are over 7000 cases in the 2016 challenge where the CASMI provided formula does not match the calculated formula. For example positive mode challenge with a Chemspider ID of 21220342 the CASMI submitted formula says C21H22N4 but the INCHI says InChI=1S/C21H20N4/ Many of them are again charged compounds, salts or isotopes. Some of the sever cases are when the CASMI provided formula does not match at all (+/- 2H) example in the pos challenge set. Example from Challenge-168 Chemspider ID 26547517, CASMI provided formula is C10H12ClNO2 but the INCHI formula is InChI=1S/C20H24Cl2N2O4/ There are some intrinsic errors, because of the INCHI API. The double MW formula case is a bit more problematic if we need to do SDF or MOL file conversions, because most of the in-silico software requires MOL or SDF file inputs. As long as the CASMI candidate structures are not affected its probably fine, hower the statistics will of course differ from approach to approach, depending if an SMILES to MOL or INCHI to MOL conversion route was taken. How to reproduce 1) Convert SMILES to INCHIKEY with OpenBabel or ChemAxon or the https://cactus.nci.nih.gov/chemical/structure Then convert INCHI to INCHIKEY using OpenBabel or ChemAxon or the https://cactus.nci.nih.gov/chemical/structure The results will differ from OpenBabel vs the others there will be issues such as: Problems/mismatches: Mobile-H( Hydrogens: Locations or number, Number; Mobile-H groups: Falsely present, Attachment points; Charge(s): Do not match; Proton balance: Does not match) Check that OpenBabel matches the ChemAxon conversion, check that both routes create the same INCHIKEY. 2) 7000 formula mismatches (pos Challenge): Use the provided CASMI CSV files. Extract the formula from the INCHI itself, compare row by row if it matches the FORMULA field vs the INCHI extracted one. I attached the Issue (1) file, the other one (2) is too large. Cheers Tobias |
|
From: Tobias K. <tk...@uc...> - 2016-05-23 19:18:04
|
Hi, There are a number of cases where incorrect InchiKeys or Inchis were submitted. In a number of cases they are related to salts or betaines or other complicated structures. Example Challange-88 from the CASMI CSV file Identifier 3624293 CompoundName 6,7-Dihydroxy-2,4-dimethylchromenium MonoisotopicMass 191.070267 MolecularFormula C11H11O3 SMILES Cc1cc([o+]c2c1cc(c(c2)O)O)C InChI InChI=1S/C11H11O3/c1-6-3-7(2)14-11-5-10(13)9(12)4-8(6)11/h3-5,12-13H,1-2H3/q+1 InChIKey BATARTWICDUXRG-UHFFFAOYSA-N NOW open the compound in chemspider (INCHIKEY can not be found) http://www.chemspider.com/Search.aspx?q=BATARTWICDUXRG <http://www.chemspider.com/Search.aspx?q=BATARTWICDUXRG>Now open the Chemspider ID http://www.chemspider.com/Chemical-Structure.3624293.html Its the same name, but the Inchikey is different (under detail) BCVCWDNQJMGQIH<http://www.google.com/search?q=BCVCWDNQJMGQIH>-UHFFFAOYSA-O<http://www.google.com/search?q=BCVCWDNQJMGQIH-UHFFFAOYSA-O> So that happened to many compounds and also might have influenced the rankings when by random chance the compound was incorrectly converted. OpenBabel creates such InchKeys sometimes. ---- There are over 7000 cases in the 2016 challenge where the CASMI provided formula does not match the calculated formula. For example positive mode challenge with a Chemspider ID of 21220342 the CASMI submitted formula says C21H22N4 but the INCHI says InChI=1S/C21H20N4/ Many of them are again charged compounds, salts or isotopes. Some of the sever cases are when the CASMI provided formula does not match at all (+/- 2H) example in the pos challenge set. Example from Challenge-168 Chemspider ID 26547517, CASMI provided formula is C10H12ClNO2 but the INCHI formula is InChI=1S/C20H24Cl2N2O4/ There are some intrinsic errors, because of the INCHI API. The double MW formula case is a bit more problematic if we need to do SDF or MOL file conversions, because most of the in-silico software requires MOL or SDF file inputs. As long as the CASMI candidate structures are not affected its probably fine, hower the statistics will of course differ from approach to approach, depending if an SMILES to MOL or INCHI to MOL conversion route was taken. How to reproduce 1) Convert SMILES to INCHIKEY with OpenBabel or ChemAxon or the https://cactus.nci.nih.gov/chemical/structure Then convert INCHI to INCHIKEY using OpenBabel or ChemAxon or the https://cactus.nci.nih.gov/chemical/structure The results will differ from OpenBabel vs the others there will be issues such as: Problems/mismatches: Mobile-H( Hydrogens: Locations or number, Number; Mobile-H groups: Falsely present, Attachment points; Charge(s): Do not match; Proton balance: Does not match) Check that OpenBabel matches the ChemAxon conversion, check that both routes create the same INCHIKEY. 2) 7000 formula mismatches (pos Challenge): Use the provided CASMI CSV files. Extract the formula from the INCHI itself, compare row by row if it matches the FORMULA field vs the INCHI extracted one. I attached the Issue (1) file, the other one (2) is too large. Cheers Tobias |
|
From: Steffen N. <sne...@ip...> - 2016-04-26 10:32:18
|
Dear all, For those who are wondering after Felicity’s email, the participants in CASMI2016 already have a preview of the automatic evaluation, and we’ve had already some feedback and modifications of the evaluation which has changed the overall outcome in one category. This demonstrates how close (and good!) the results were this year! In case you missed it, the solutions were made public yesterday: http://www.casmi-contest.org/2016/solutions-cat1.shtml The results are very exciting, but as we all still need a few days to check them, these will be made public soon, stay tuned. Thanks, Emma, Gregory, Steffen |
|
From: Felicity A. <fel...@ua...> - 2016-04-26 10:31:22
|
Hi Emma, Certainly setting the training set introduces some extra challenges, e.g. choosing which data and deciding how big the set should be. And that may in turn affect how methods do or whether it is even feasible to re-train them on the full set, or for the slower methods whether a subset has to be used. However I think it probably is the best way of making the playing field fair, and really assessing the methods themselves rather than just which ones happened to train on data that was closest to the competition compounds. Most competitions in the machine learning world set the training data as well as the test compounds. It is more work if it means retraining a method, so could be a potential barrier to entry, but it does make the results more informative. For the alternative, and perhaps easier, suggestion of just making sure the compounds themselves are not included in the training data, I was thinking to disallow all the provided candidates, since the correct compounds are not known (and even providing a training set with only the challenge compounds left out means you could potentially cheat by not selecting any that are left in!). It means a bigger list of those disallowed, but I think there should still be plenty of other data out there to train on? Felicity On Tue, Apr 26, 2016 at 11:04 AM, Schymanski, Emma <Emm...@ea... > wrote: > Hi Felicity, > > As an organiser I wondered something similar for next time - whether we > should "recommend" the training set for those methods who would use > spectral libraries to train their data? It's clear from the current set > that we can't provide enough for sensible training, but we could have > easily have pointed you towards the data that would have been best to use > ... > > One possibility for the future would be to take one of the open datasets > and either omit those that are challenges, or ensure that the challenges > are not in that training set, like Kai did - except as organisers we of > course know the exact ones, Kai guessed (pretty well). > Would it be feasible to even "dictate" the training data? > Having asked that - there is currently no plan where the next set of > substances could come from and this would obviously greatly influence the > most appropriate training data. > > I would not like to exclude people who would just want to use current > versions "as is" - not many did it this time, but any user could have e.g. > used the latest CFM-ID and submitted those results, but would requiring > retraining make this no longer possible? > > Thanks, > Emma > > ------------------------------ > *From:* Felicity Allen [fel...@ua...] > *Sent:* Tuesday, 26 April 2016 11:38 AM > *To:* cas...@li... > *Subject:* [Casmi-discuss] congrats and a further thought > > Hi All, > > Congratulations to the new winners, and again to all participants and > organisers. > > One further thought on a potential rule change for the next iteration of > the competition. For category 2, I would suggest that for automated > methods, ideally all methods would train on the same data, but if that is > not possible, I would suggest that all methods should not be allowed to > include any of the candidate compounds in their training set - we're > probably all guilty of it this time, and I thought that the extra analysis Dührkop et > al did on it for this competition made a really important point. > > Thanks all for a fun competition :)! > > Best wishes, > Felicity > |
|
From: Schymanski, E. <Emm...@ea...> - 2016-04-26 10:04:23
|
Hi Felicity, As an organiser I wondered something similar for next time - whether we should "recommend" the training set for those methods who would use spectral libraries to train their data? It's clear from the current set that we can't provide enough for sensible training, but we could have easily have pointed you towards the data that would have been best to use ... One possibility for the future would be to take one of the open datasets and either omit those that are challenges, or ensure that the challenges are not in that training set, like Kai did - except as organisers we of course know the exact ones, Kai guessed (pretty well). Would it be feasible to even "dictate" the training data? Having asked that - there is currently no plan where the next set of substances could come from and this would obviously greatly influence the most appropriate training data. I would not like to exclude people who would just want to use current versions "as is" - not many did it this time, but any user could have e.g. used the latest CFM-ID and submitted those results, but would requiring retraining make this no longer possible? Thanks, Emma ________________________________ From: Felicity Allen [fel...@ua...] Sent: Tuesday, 26 April 2016 11:38 AM To: cas...@li... Subject: [Casmi-discuss] congrats and a further thought Hi All, Congratulations to the new winners, and again to all participants and organisers. One further thought on a potential rule change for the next iteration of the competition. For category 2, I would suggest that for automated methods, ideally all methods would train on the same data, but if that is not possible, I would suggest that all methods should not be allowed to include any of the candidate compounds in their training set - we're probably all guilty of it this time, and I thought that the extra analysis Dührkop et al did on it for this competition made a really important point. Thanks all for a fun competition :)! Best wishes, Felicity |
|
From: Felicity A. <fel...@ua...> - 2016-04-26 09:38:17
|
Hi All, Congratulations to the new winners, and again to all participants and organisers. One further thought on a potential rule change for the next iteration of the competition. For category 2, I would suggest that for automated methods, ideally all methods would train on the same data, but if that is not possible, I would suggest that all methods should not be allowed to include any of the candidate compounds in their training set - we're probably all guilty of it this time, and I thought that the extra analysis Dührkop et al did on it for this competition made a really important point. Thanks all for a fun competition :)! Best wishes, Felicity |
|
From: Steffen N. <sne...@ip...> - 2016-04-14 12:08:09
|
Dear all, This is a quick update from our side, after submissions received by the first deadline on Monday. We now know (approximately) who our participants will be, for which categories. There is still one day left for participants to send in updated submissions, until Friday night (23:59 CEST). We have a preview run of the automatic evaluation (but of course won’t show the results yet!) and found & fixed small issues on our side, and contacted some participants to fix some issues on their end. In all categories, all participants had some (joint) challenge wins, so there are no obvious technical issues as far as we can see. All participants will get a preview of the results (once final submissions are processed) next week to comment on any possible remaining issues. **Category 1: Best Structure Identification on Natural Products** In the first category there are five external participants with manual but also (semi-)automatic approaches. So far each had at least one win, and altogether 19 correct solutions ranked first. **Category 2: Best Automatic Structural Identification - In Silico Fragmentation Only** There are four external participants and one internal one (which will not count towards the declaration of the winner). Some of these submissions were only preliminary. The median rank across all participants (so far, on these preliminary results) was 9, and there are currently 159 first ranks across all participants. **Category 3: Best Automatic Structural Identification - Full Information** In this category there will be at least three external participants (more promised until Friday) plus one internal. Again, all participants had some challenge wins, so there are no obvious technical issues as far as we can see. The preliminary results revealed a median rank across all participants of two, and there are currently 665 first ranks across all participants. That’s it for now, Your CASMI organisers |
|
From: Steffen N. <sne...@ip...> - 2016-04-11 15:44:02
|
Dear Celine, In the rules of category 2 [1] we say "Mass spectral libraries can only be used for training of prediction models, but not to solve the challenge by querying with the peaklist against the library." Thus for Cat2 this implies that the question "Spectral libraries" in the abstract template should be answered "no". It would be great if you actual abstract text clarifies which and how you used a spectral library for training. Hope that helped, yours, Steffen -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
|
From: Celine B. <cel...@aa...> - 2016-04-11 12:47:27
|
Dear CASMI team, I would like to know if the term "Spectral libraries" in the abstract template refers to spectra library lookup and/or to using mass spectral libraries for training. Thank you, Céline Brouard |
|
From: Steffen N. <sne...@ip...> - 2016-04-08 11:52:56
|
Hi Tobias, I can confirm that we can process submissions where even within one submissionfile you can mix SMILES and INCHIs in arbitrary order. If you want, you can send me an early preview, and I'll include it into the pre-evaluation to catch any issues. I wish you all the best, Yours, Steffen On Fr, 2016-04-08 at 01:32 +0000, Tobias Kind wrote: > Hi, > I wonder if I can submit a solution column with SMILES and InCHIs > mixed below each other (row 1-20 smiles, row 21-30 INCHI)? > Thanks > Tobias -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
|
From: Steffen N. <sne...@ip...> - 2016-04-07 06:55:12
|
Dear Arpana, I hope I can clarify some of the points here: On Do, 2016-04-07 at 05:33 +0200, Arpana Vaniya wrote: > Dear CASMI Team, > > As the deadline is approaching, I just want to clarify one thing > about submission before I start making my result files/abstract. > > It is stated that: > > "Participants can enter a maximum of three submissions per approach > and category" > > Doe this just means that I can only submit up to three candidate > structures per approach that I use. Or is there not a limitation on > the number of answers I can provide per challenge (say for category > 1) Previously, each participant was limited to just one submission, which is a ZIP file with one sorted candidate list (of arbitrary size) for each challenge unknown spectrum. Some participants commented that they would like to use different approaches or parameter settings, with e.g. a tradeoff between specificity and sensitivity, or quick vs. extremely long computation time, or using just one program vs. combining several things. Back in 2012, we had e.g. one submission for MetFrag, and evaluated different variants only in the subsequent paper [1]. With the "three submissions per participant" you can provide three different ZIP files, and all of them will undergo the automatic evaluation. Only the best overall performing submission per participant will be considered in declaring the winner(s). This could be particularly interesting especially for the automated Categories 2+3. > Also on the submission format page it states, > > "If you enter a category with different submissions, use different > participant names, e.g.MetFragWithCitations and MetFragWithRT." Is a > submission not a list of the candidates but rather an approach used > in different ways i.e. citations vs. RT? Sorry for the confusion here. This is to help us in the automatic evaluation. The "participant name" is part of the filenames you submit, and each "participant name" will become a column in the output table [2] of our evaluation. In the abstract template where you briefly describe your submission, you will use the same participant name as the ParticipantID, and of course in the Authors field all real names of the team that did the work: MetFragWithCitations-Category3.txt: ParticipantID: MetFragWithCitations Authors: Emma Schymanski(1), Christoph Ruttkies(2), ... MetFragWithRT-Category3.txt: ParticipantID: MetFragWithRT Authors: Emma Schymanski(1), Christoph Ruttkies(2), ... PureMetFrag-Category2.txt: ParticipantID: PureMetFrag Authors: Emma Schymanski(1), Christoph Ruttkies(2), ... > The example of a submission has a lot of candidates listed so I > wanted to clarify the number of answers we can submit per challenge. In Category 1, you can submit a list of any length of candidates for each challenge unknown spectrum. Last year, you only submitted lists of length=1, and were extremely good at that. But you missed e.g. the structure of challenge21 last year, and could have sent a longer list, where the correct one might have been "only" ranked third. Then you'd have won challenge21 if you'd been better than Felicities' submission where the corrected structure ranked 4th. This year for categories 2+3 it is a bit different, since we have provided the candidate lists so they are the same for all participants. This is aimed at automated approaches, so that the evaluation of different scoring methods is more comparable, and not influenced by how participants queried for the candidate lists. In summary, what you provided last year would be a valid format also this year for Category1. I hope that helped, feel free to ask back if there is still something unclear. You can also send me a "preview" of your submission, and I'll check that the format is looking good. I won't tell you if you got the right structures, though :-) Yours, Steffen [1] http://www.mdpi.com/2218-1989/3/3/623/htm#3ResultsandDiscussion [2] http://www.casmi-contest.org/2014/results-cat2.shtml -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
|
From: Nikolic, D. <dn...@ui...> - 2016-04-04 16:31:43
|
Dear CASMIers, Please, check the CASMI website for updated results of the CASMI 2014. During the writing of the overview paper, some errors were discovered regarding structures of Challenges 14 and 29. In addition, results for the unknown challenges 43-48 are now posted. I would also like to inform you that Proceedings of the CASMI2014 are currently under review for publication in Current Metabolomics. The Thematic Issue will have five papers including the overview paper from the organizers. Sincerely, CASMI 2014 Organizing Team |
|
From: Steffen N. <sne...@ip...> - 2016-04-04 14:30:40
|
Dear Arpana, I wanted to tell you to the *.mzXML raw data, because the Polarity is part of the <scan> information. Unfortunately, the mzXML converter used here (MassWolf) simply provided: <scan num="2" msLevel="1" peaksCount="23571" polarity="any" ^^^^^ ! which is entirely useless. So I had to go back to the *.raw raw data, where for Waters there is a human readable description: CASMI_11012016_14.raw/_extern.inf:Polarity ES+ CASMI_11012016_15.raw/_extern.inf:Polarity ES+ CASMI_11012016_19.raw/_extern.inf:Polarity ES+ So the short answer is: They are all positive mode, and the long answer is: we need to make sure we get the raw data from Vendors in open formats *with all information* included (see [1] esp. Conclusion at the end]). Great to see you in CASMI again, Yours, Steffen [1] http://link.springer.com/article/10.1007/s11306-015-0879-3/fulltext.html On Mo, 2016-04-04 at 05:02 +0200, Arpana Vaniya wrote: > Dear CASMI Team, > > Just wondering if Challenge 5, 8 and 9 in Cat 1. is the data in POS > or NEG mode? Also which adduct is it since that information has been > given for the other challenges as well. Thank you so much. > > Arpana Vaniya -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |
|
From: Steffen N. <sne...@ip...> - 2016-02-15 12:10:47
|
Dear all, At the Dagstuhl seminar last year we started a discussion on a possible joint input format for the in silico identification tools [1]. In brief, the suggestion was to use MGF as a possible query file format. For the CASMI category 2+3 challenge data there are now MGF files in addition to the plain peak lists. While the MGF files for the training data do not yet contain the identity of the correct solution (that is currently in a separate CSV table), we could use cas...@li... to decide what additional information the MGF format should contain, and then update the training data. Feedback welcome! Yours, Steffen and Emma [1] https://docs.google.com/document/d/1_5nXwTZyl6Ydhz766tfLEOQ7TZJhnxPsjytVzfEC7uU/edit# |
|
From: Samuel B. <sam...@un...> - 2016-02-15 08:24:28
|
Thanks a lot for your reactivity. the computer will start soon its work. best regards Samuel ------------------------------------------------------------------------ Samuel BERTRAND, Maitre de conférence (Assistant Professor) Ph.D. Chimie, M.Sc. Chimie (ENSCL) Université de Nantes -> UFR des Sciences Pharmaceutiques et Biologiques, Tel: +33(0)2 53 48 43 10 (interne: 33 43 10) -> Laboratoire Mer, Molécules, Santé (EA 2160), Tel: +33(0)2 51 12 56 89 (interne: 45 56 89) 9 rue bias BP 53508, 44035 Nantes cedex 1, France https://www.researchgate.net/profile/Samuel_Bertrand/ http://www.mms.univ-nantes.fr <http://www.mms.univ-nantes.fr/23221706/0/fiche___pagelibre/&RH=MMS_FR1&RF=1352222888411> ------------------------------------------------------------------------ http://bertrandsamuel.free.fr/gp2a/ http://bertrandsamuel.free.fr/siderophore_base/ http://scholar.google.com/citations?user=Y4_9oyMAAAAJ Le 15/02/2016 08:41, Schymanski, Emma a écrit : > > Dear Samuel and all, > > The raw data (as mzML, centroided) for Challenges 10 to 19 is now > available, along with MS1 peak lists extracted from Xcalibur without > any filtering/interpretation whatsoever. Note that some of the > substances are very low intensity. The retention times, precursor m/z > and ionization for each challenge can be found on the website next to > the raw file link: > > http://www.casmi-contest.org/2016/challenges-cat1.shtml#10to19 > > Thanks to Steffen for being the webmaster over the weekend! > > I hope Gregory can help you out with Challenges 1-9 soon. > > Thanks, > > Emma > > °°° > > Dr. Emma Schymanski > > Eawag – Umweltchemie/Environmental Chemistry > Überlandstrasse 133 > 8600 Dübendorf > Schweiz/Switzerland > Tel. +41 (0)58 765 55 37 > Fax. +41 (0)58 765 58 26 > > emm...@ea... <mailto:emm...@ea...> > > http://www.eawag.ch/~schymaem <http://www.eawag.ch/%7Eschymaem> > > *From:*BERTRAND Samuel [mailto:sam...@un...] > *Sent:* Friday, 12 February 2016 7:26 PM > *To:* Schymanski, Emma <Emm...@ea...> > *Cc:* cas...@li... > *Subject:* Re: [Casmi-discuss] DataSet for CASMI > > Hello, > > Thanks for your answer. If this comment can start some fruitful > discussion it is very interesting. > > Many of us have different goals through CASMI. > > Concerning category1, I am more interested in raw LC-MS data under any > format. It is just that the centroid mode is more convenient and > profile mode are not easy to convert depending on the acquisition > apparatus (Agilent/Waters/Thermo/Shimadzu). > > Anyway as it was provided last year could be a compromise but clean > data already impact the real challenge of identifying a compound from > raw data. > > It could be interesting to me to provide raw data as mzXML for example > under centroid mode. > > I will do my best to answer the category one challenge which is > already a challenging task from raw data. > > Bye > > Samuel > > > > _______________________________________ > Samuel BERTRAND, Maitre de conférence Ph.D. Chimie, M.Sc. Chimie > (ENSCL), Université de Nantes, > ->UFR des Sciences Pharmaceutiques et Biologiques, Tel: +33(0)2 53 48 > 43 10 <tel:+33%280%292%2053%2048%2043%2010> (interne: 334310) > ->Laboratoire Mer, Molécules, Santé (EA 2160) > https://www.researchgate.net/profile/Samuel_Bertrand/ > http://bertrandsamuel.free.fr/gp2a/ > http://bertrandsamuel.free.fr/siderophore_base/ > http://www.researcherid.com/rid/G-4484-2010 > http://scholar.google.com/citations?user=Y4_9oyMAAAAJ > http://www.biomedexperts.com/Profile/ResearchProfile.aspx?pid=2008642 > > > Le 12 févr. 2016 à 11:32, Schymanski, Emma <Emm...@ea... > <mailto:Emm...@ea...>> a écrit : > > Dear Samuel, > > Many thanks for your feedback and starting discussions - this is > really valuable! > We only received this additional data at very short notice and are > working hard to make this good for as many participants as > possible. The software I used to extract the MS/MS data does not > automatically do this for MS1 yet, this is why isotope and adduct > information is not yet available (we are already discussing how we > can auto-create MS1 files in the near future). If this data is > interesting, especially for Category 1, we can look into providing > this. > Two options: a raw extraction (e.g. export the MS1 scan > immediately preceeding the MSMS scan completely unfiltered as a > peak list) or a form of "cleaned-up" peak list - which would be > feasible for the 10 Category 1 datasets but not for the whole > dataset for Category 2 & 3. The former has the advantage of no > "pre-interpretation", the latter is what has generally happened in > the past. Which is better (more "realistic")? > > Would you (and others) also be interested also in the raw (mzML) > data for Challenges 10-19 of Category 1? > We have these files, but were not sure whether to upload or not. > It is difficult for us to anticipate which data is needed by > different people - we are happy to help where we can. > > Thanks again for your feedback, much appreciated! > Emma > > ------------------------------------------------------------------------ > > *From:*Samuel Bertrand [sam...@un... > <mailto:sam...@un...>] > *Sent:* Friday, 12 February 2016 10:48 AM > *To:* cas...@li... <mailto:cas...@li...> > *Subject:* [Casmi-discuss] DataSet for CASMI > > Dear all, > > I am working on the CASMI for this year, mostly Category 1. > > I am just telling you some concern that I have on the direction > where CASMI of moving. > To my point of view, the challenge is going closer and closer to > developers of algorithm for MS/MS interpretation and away to > scientist how can be interested from using them. > The data are looking less and less "real" in the way we download > them from the CASMI webpage. For example, the second round data > only include main ion and MS2 fragmentation, no adduct and > isotopes information to challenge full workflow. > > I can understand the interest for computer science to work on the > best strategies to identify compounds from those data, and I see > the point in these data. > the consequences is that the people that could be interested by > the challenge itself will be very limited to those doing computer > science. > > It seems that this year CASMI generate a lot of discussion but > unfortunately not in the CASMI mailing list. > > This is just a comment to remind you that potential users of the > CASMI development may not understand and therefore use it. > > One more thing concerning the data set provided: > - I try to use the raw data from category 1 challenge 1-9, the as > they are in profile mode the conversion to centroid mode remain > difficult without the Waters/Agilent algorithm. The proteowizard > generic conversion is not perfect at all and generate a lot of > noise. Therefore it could be good to provide the centroid raw data > converted by the appropriate algorithm. This is what real people > generally do with their instrument. Therefore the peaklist added > are much easier to use, but they do not correspond to real raw > data as obtain on the machine. > - to my point of view category 1 challenge 10-19 are too > deconnected from real data ( no adduct and isotopes ). > > in conclusion the category1 data are difficult to use all together > to challenge complete workflow. > > in addition, did you think of providing biological source and LC > protocol for category one at some stage of the challenge ? (like > after first submission) > This could be of interest to natural product chemist. > > best regards > Samuel > > -- > > ------------------------------------------------------------------------ > > Samuel BERTRAND, > > Maitre de conférence (Assistant Professor) > Ph.D. Chimie, M.Sc. Chimie (ENSCL) > > Université de Nantes > -> UFR des Sciences Pharmaceutiques et Biologiques, Tel: +33(0)2 > 53 48 43 10 (interne: 33 43 10) > -> Laboratoire Mer, Molécules, Santé (EA 2160), Tel: +33(0)2 51 12 > 56 89 (interne: 45 56 89) > 9 rue bias BP 53508, 44035 Nantes cedex 1, France > > https://www.researchgate.net/profile/Samuel_Bertrand/ > http://www.mms.univ-nantes.fr > > ------------------------------------------------------------------------ > > http://bertrandsamuel.free.fr/gp2a/ > http://bertrandsamuel.free.fr/siderophore_base/ > http://scholar.google.com/citations?user=Y4_9oyMAAAAJ > |
|
From: Schymanski, E. <Emm...@ea...> - 2016-02-15 07:41:20
|
Dear Samuel and all, The raw data (as mzML, centroided) for Challenges 10 to 19 is now available, along with MS1 peak lists extracted from Xcalibur without any filtering/interpretation whatsoever. Note that some of the substances are very low intensity. The retention times, precursor m/z and ionization for each challenge can be found on the website next to the raw file link: http://www.casmi-contest.org/2016/challenges-cat1.shtml#10to19 Thanks to Steffen for being the webmaster over the weekend! I hope Gregory can help you out with Challenges 1-9 soon. Thanks, Emma °°° Dr. Emma Schymanski Eawag – Umweltchemie/Environmental Chemistry Überlandstrasse 133 8600 Dübendorf Schweiz/Switzerland Tel. +41 (0)58 765 55 37 Fax. +41 (0)58 765 58 26 emm...@ea...<mailto:emm...@ea...> http://www.eawag.ch/~schymaem From: BERTRAND Samuel [mailto:sam...@un...] Sent: Friday, 12 February 2016 7:26 PM To: Schymanski, Emma <Emm...@ea...> Cc: cas...@li... Subject: Re: [Casmi-discuss] DataSet for CASMI Hello, Thanks for your answer. If this comment can start some fruitful discussion it is very interesting. Many of us have different goals through CASMI. Concerning category1, I am more interested in raw LC-MS data under any format. It is just that the centroid mode is more convenient and profile mode are not easy to convert depending on the acquisition apparatus (Agilent/Waters/Thermo/Shimadzu). Anyway as it was provided last year could be a compromise but clean data already impact the real challenge of identifying a compound from raw data. It could be interesting to me to provide raw data as mzXML for example under centroid mode. I will do my best to answer the category one challenge which is already a challenging task from raw data. Bye Samuel _______________________________________ Samuel BERTRAND, Maitre de conférence Ph.D. Chimie, M.Sc. Chimie (ENSCL), Université de Nantes, ->UFR des Sciences Pharmaceutiques et Biologiques, Tel: +33(0)2 53 48 43 10<tel:+33(0)2%2053%2048%2043%2010> (interne: 334310) ->Laboratoire Mer, Molécules, Santé (EA 2160) https://www.researchgate.net/profile/Samuel_Bertrand/ http://bertrandsamuel.free.fr/gp2a/ http://bertrandsamuel.free.fr/siderophore_base/ http://www.researcherid.com/rid/G-4484-2010 http://scholar.google.com/citations?user=Y4_9oyMAAAAJ http://www.biomedexperts.com/Profile/ResearchProfile.aspx?pid=2008642 Le 12 févr. 2016 à 11:32, Schymanski, Emma <Emm...@ea...<mailto:Emm...@ea...>> a écrit : Dear Samuel, Many thanks for your feedback and starting discussions - this is really valuable! We only received this additional data at very short notice and are working hard to make this good for as many participants as possible. The software I used to extract the MS/MS data does not automatically do this for MS1 yet, this is why isotope and adduct information is not yet available (we are already discussing how we can auto-create MS1 files in the near future). If this data is interesting, especially for Category 1, we can look into providing this. Two options: a raw extraction (e.g. export the MS1 scan immediately preceeding the MSMS scan completely unfiltered as a peak list) or a form of "cleaned-up" peak list - which would be feasible for the 10 Category 1 datasets but not for the whole dataset for Category 2 & 3. The former has the advantage of no "pre-interpretation", the latter is what has generally happened in the past. Which is better (more "realistic")? Would you (and others) also be interested also in the raw (mzML) data for Challenges 10-19 of Category 1? We have these files, but were not sure whether to upload or not. It is difficult for us to anticipate which data is needed by different people - we are happy to help where we can. Thanks again for your feedback, much appreciated! Emma ________________________________ From: Samuel Bertrand [sam...@un...<mailto:sam...@un...>] Sent: Friday, 12 February 2016 10:48 AM To: cas...@li...<mailto:cas...@li...> Subject: [Casmi-discuss] DataSet for CASMI Dear all, I am working on the CASMI for this year, mostly Category 1. I am just telling you some concern that I have on the direction where CASMI of moving. To my point of view, the challenge is going closer and closer to developers of algorithm for MS/MS interpretation and away to scientist how can be interested from using them. The data are looking less and less "real" in the way we download them from the CASMI webpage. For example, the second round data only include main ion and MS2 fragmentation, no adduct and isotopes information to challenge full workflow. I can understand the interest for computer science to work on the best strategies to identify compounds from those data, and I see the point in these data. the consequences is that the people that could be interested by the challenge itself will be very limited to those doing computer science. It seems that this year CASMI generate a lot of discussion but unfortunately not in the CASMI mailing list. This is just a comment to remind you that potential users of the CASMI development may not understand and therefore use it. One more thing concerning the data set provided: - I try to use the raw data from category 1 challenge 1-9, the as they are in profile mode the conversion to centroid mode remain difficult without the Waters/Agilent algorithm. The proteowizard generic conversion is not perfect at all and generate a lot of noise. Therefore it could be good to provide the centroid raw data converted by the appropriate algorithm. This is what real people generally do with their instrument. Therefore the peaklist added are much easier to use, but they do not correspond to real raw data as obtain on the machine. - to my point of view category 1 challenge 10-19 are too deconnected from real data ( no adduct and isotopes ). in conclusion the category1 data are difficult to use all together to challenge complete workflow. in addition, did you think of providing biological source and LC protocol for category one at some stage of the challenge ? (like after first submission) This could be of interest to natural product chemist. best regards Samuel -- ________________________________ Samuel BERTRAND, Maitre de conférence (Assistant Professor) Ph.D. Chimie, M.Sc. Chimie (ENSCL) Université de Nantes -> UFR des Sciences Pharmaceutiques et Biologiques, Tel: +33(0)2 53 48 43 10 (interne: 33 43 10) -> Laboratoire Mer, Molécules, Santé (EA 2160), Tel: +33(0)2 51 12 56 89 (interne: 45 56 89) 9 rue bias BP 53508, 44035 Nantes cedex 1, France https://www.researchgate.net/profile/Samuel_Bertrand/ http://www.mms.univ-nantes.fr ________________________________ http://bertrandsamuel.free.fr/gp2a/ http://bertrandsamuel.free.fr/siderophore_base/ http://scholar.google.com/citations?user=Y4_9oyMAAAAJ |
|
From: BERTRAND S. <sam...@un...> - 2016-02-12 18:26:10
|
Hello, Thanks for your answer. If this comment can start some fruitful discussion it is very interesting. Many of us have different goals through CASMI. Concerning category1, I am more interested in raw LC-MS data under any format. It is just that the centroid mode is more convenient and profile mode are not easy to convert depending on the acquisition apparatus (Agilent/Waters/Thermo/Shimadzu). Anyway as it was provided last year could be a compromise but clean data already impact the real challenge of identifying a compound from raw data. It could be interesting to me to provide raw data as mzXML for example under centroid mode. I will do my best to answer the category one challenge which is already a challenging task from raw data. Bye Samuel _______________________________________ Samuel BERTRAND, Maitre de conférence Ph.D. Chimie, M.Sc. Chimie (ENSCL), Université de Nantes, ->UFR des Sciences Pharmaceutiques et Biologiques, Tel: +33(0)2 53 48 43 10 (interne: 334310) ->Laboratoire Mer, Molécules, Santé (EA 2160) https://www.researchgate.net/profile/Samuel_Bertrand/ http://bertrandsamuel.free.fr/gp2a/ http://bertrandsamuel.free.fr/siderophore_base/ http://www.researcherid.com/rid/G-4484-2010 http://scholar.google.com/citations?user=Y4_9oyMAAAAJ http://www.biomedexperts.com/Profile/ResearchProfile.aspx?pid=2008642 > Le 12 févr. 2016 à 11:32, Schymanski, Emma <Emm...@ea...> a écrit : > > Dear Samuel, > > Many thanks for your feedback and starting discussions - this is really valuable! > We only received this additional data at very short notice and are working hard to make this good for as many participants as possible. The software I used to extract the MS/MS data does not automatically do this for MS1 yet, this is why isotope and adduct information is not yet available (we are already discussing how we can auto-create MS1 files in the near future). If this data is interesting, especially for Category 1, we can look into providing this. > Two options: a raw extraction (e.g. export the MS1 scan immediately preceeding the MSMS scan completely unfiltered as a peak list) or a form of "cleaned-up" peak list - which would be feasible for the 10 Category 1 datasets but not for the whole dataset for Category 2 & 3. The former has the advantage of no "pre-interpretation", the latter is what has generally happened in the past. Which is better (more "realistic")? > > Would you (and others) also be interested also in the raw (mzML) data for Challenges 10-19 of Category 1? > We have these files, but were not sure whether to upload or not. It is difficult for us to anticipate which data is needed by different people - we are happy to help where we can. > > Thanks again for your feedback, much appreciated! > Emma > > From: Samuel Bertrand [sam...@un...] > Sent: Friday, 12 February 2016 10:48 AM > To: cas...@li... > Subject: [Casmi-discuss] DataSet for CASMI > > Dear all, > > I am working on the CASMI for this year, mostly Category 1. > > I am just telling you some concern that I have on the direction where CASMI of moving. > To my point of view, the challenge is going closer and closer to developers of algorithm for MS/MS interpretation and away to scientist how can be interested from using them. > The data are looking less and less "real" in the way we download them from the CASMI webpage. For example, the second round data only include main ion and MS2 fragmentation, no adduct and isotopes information to challenge full workflow. > > I can understand the interest for computer science to work on the best strategies to identify compounds from those data, and I see the point in these data. > the consequences is that the people that could be interested by the challenge itself will be very limited to those doing computer science. > > It seems that this year CASMI generate a lot of discussion but unfortunately not in the CASMI mailing list. > > This is just a comment to remind you that potential users of the CASMI development may not understand and therefore use it. > > One more thing concerning the data set provided: > - I try to use the raw data from category 1 challenge 1-9, the as they are in profile mode the conversion to centroid mode remain difficult without the Waters/Agilent algorithm. The proteowizard generic conversion is not perfect at all and generate a lot of noise. Therefore it could be good to provide the centroid raw data converted by the appropriate algorithm. This is what real people generally do with their instrument. Therefore the peaklist added are much easier to use, but they do not correspond to real raw data as obtain on the machine. > - to my point of view category 1 challenge 10-19 are too deconnected from real data ( no adduct and isotopes ). > > in conclusion the category1 data are difficult to use all together to challenge complete workflow. > > in addition, did you think of providing biological source and LC protocol for category one at some stage of the challenge ? (like after first submission) > This could be of interest to natural product chemist. > > best regards > Samuel > > > -- > Samuel BERTRAND, > > Maitre de conférence (Assistant Professor) > Ph.D. Chimie, M.Sc. Chimie (ENSCL) > > Université de Nantes > -> UFR des Sciences Pharmaceutiques et Biologiques, Tel: +33(0)2 53 48 43 10 (interne: 33 43 10) > -> Laboratoire Mer, Molécules, Santé (EA 2160), Tel: +33(0)2 51 12 56 89 (interne: 45 56 89) > 9 rue bias BP 53508, 44035 Nantes cedex 1, France > > https://www.researchgate.net/profile/Samuel_Bertrand/ > http://www.mms.univ-nantes.fr > http://bertrandsamuel.free.fr/gp2a/ > http://bertrandsamuel.free.fr/siderophore_base/ > http://scholar.google.com/citations?user=Y4_9oyMAAAAJ > > |