SmilesGenerator has a bug when the molecule contains a charged, aromatic planar3-nitrogen and useAromaticity is set to true.
Change line 1663 (approximately) from
if (a.getSymbol().equals("N") && a.getHybridization() == IAtomType.Hybridization.PLANAR3 && container.getConnectedAtomsList(a).size() != 3) {
To:
if (a.getSymbol().equals("N") && a.getHybridization() == IAtomType.Hybridization.PLANAR3 && container.getConnectedAtomsList(a).size() != 3 && a.getFormalCharge() == 0) {
eg. add additional check for atom charge:
a.getFormalCharge() == 0
Molecule ZINC58167940 can be used for testing (see attachement.
To be clear, the explicit hydrogen on the lower case nitrogen is a must; that's not the bug.
However, it does currently not correctly integrate this with the charge information causing the double []...
So, rather than outputting:
[[nH]-]
... it should output instead:
[nH-]
First of I'm not a chemist so what I say might be wrong.
ZINC lists Cc1cc(c(n1c2[n-]ncn2)C)c3csc(n3)CCCOC as SMILES for this molecule and that smiles can be converted back to the same image as displayed in ZINC using as example the identifier resolver.
In the molfile the nitrogen has no bond to any hydrogen and they are explicitly in the file. So I think the H does not belong there and [n-] is correct for this specific molecule? (or [nH] with no charge)
EDIT: It is charged in ZINC due to reference pH used
http://zinc.docking.org/substance/58167940
so [n-] seems to be correct fro this specific case or [nH] uncharged at different pH.
Last edit: Joos Kiener 2013-05-30
You're right. There were two issues: one was the double brackets, but it should not have a hydrogen anyway...
I created a patch for the two issues: https://sourceforge.net/p/cdk/patches/641/
The first two patches fix the double bracket problem, and the second is a variation of the patch suggested by Joos. Besides testing of the charge is 0, we should also test if it is unset (null) and so that we can assume it is zero.
Fixed by patch 641