Samples from Daylight theory page http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
Converting from these normal SMILES
OC(=O)C(Br)(Cl)N
ClC(Br)(N)C(=O)O
O=C(O)C(N)(Br)Cl
Two different Canonical SMILES yielded
O=C(O)C(Cl)(Br)N
O=C(O)C(Br)(Cl)N
Charlie
Unit test added
On investigation it appears that the problem stems from the fact that the initial labeling of the Cl and Br are identical. As a result the initial sort and ranking will give Cl and Br the same inv pair of 11000. However, as the algorithm expands the neighbor hood - the nbrhood of both atoms is identical. Thus the final inv pair depends on whether Cl or Br came first in the original SMILES.
Atcually, this is easily fixed, by noting that if the input molecule does not have its atomic number configured, the atomic num prortion of the initial inv label is 0 - this is wrong. Instead, if it's not configured we pull the atomic num from the PeriodicTable and then carry on. As a result, can smiles are identical. I'll upload a patch to fix this in a bit
The patch will take a while to work out due to issues not-related ot this bug. But a simple solution is to ensure that the molecule is appropriately configured - in this case ensure that atomic numbers are confiogured by doing:
IsotopeFactory fact = IsotopeFactory.getInstance(DefaultChemObjectBuilder.getInstance());
fact.configureAtoms(molecule);
As a follow on I am closing this bug, and filing a more specific bug for the canonical labeler