extracting the substructure patterns of CircularFingerprint. #231

johnmay · 2016-08-18T15:05:27Z

Clean up of #224

johnmay · 2016-08-18T15:38:04Z

Made some minor clean up, some other quick questions. For uncharged atoms it should probably output '+0' for the charge layer. This makes all the atoms 'complex' SMARTS expressions but is more correct. Likewise, a single bond should be output between aromatic atoms if it's not aromatic.

johnmay · 2016-08-18T15:46:09Z

For CCC shouldn't the SMARTS produced be:

CC* and C(*)C

are the same?

Explicit SMARTS might capture more the features:

r=0
[CX4v4H3+0]   C*
[CX4v4H2+0]   C(*)*

r=1
[CX4v4H3+0][CX4v4H2+0]  CC* and C(*)C
[CX4v4H3+0][CX4v4H2+0][CX4v4H3+0] CCC

vedina · 2016-08-19T07:27:34Z

@ntk73 knows better, but we might not have all the info in place (from the internal CircularFingerprints structures) in order to generate SMARTS with some level of details. Regarding explicit SMARTS [CX4v4H3+0] vs C* I am not sure explicit SMARTS is what we want, as this feature is implemented as alternative for JCompoundMapper one and I don't recall SMARTS like CX4v4H3+0] there (to be checked). Could be an option though.

joergkurtwegner · 2016-08-19T08:04:54Z

Keeping it simple seems to be easier, though some patterns might be mapped to the wrong molecule, e.g. depending if CCC is a chain or might contain a branch. Then again, since also all methods learning on such patterns would not distinguish, this is what the generalization is about. Thus, the simpler version might just do. If more details are required I would provide an option to put it into the $(environment) of the SMARTS matching routine and make it an optional choice.

johnmay · 2016-08-19T09:22:49Z

Okay so a quick question, are these SMARTS meant to be used for building models or the matching molecules?

@vedina I realised at the gym, that the internal structures are actually cause a bit of a problem. Is the intention these patterns are searchable because that's not possible? Alex implemented the class as a standalone and thus uses it's own internal aromaticity model.

Here are the SMARTS for indole:

N(*)*
C(*)=*
C(=*)*
c(*)(:a):a
c(:a)(*):a
c(:a):a
c(:a):a
c(:a):a
c(:a):a
N(C=*)c(:a):a
N(C=C*)*
C(*)=Cc(:a):a
C(=*)c(c(*):a)c:a
N(*)c(c(*):a)c:a
c(:a)(*)cc:a
c(:a)cc:a
c(:a)cc:a
c(*)(:a)cc:a
N1C=Cc(c1c:a):a
N1C=Cc(c1:a):a
N1C=Cc(c1:a)c:a
N1C=Cc(c1c:a)cc:a
N1C=Cc(c1cc:a)c:a
N(*)c(c(*):a)ccc:a
c(:a)(*)cccc:a
c(*)(:a)cccc:a
C(=*)c(c(*):a)ccc:a
N1C=Cc(c1c:a)c:a
N1C=Cc2c1cccc2
N(*)c1c(*)cccc1
C(=*)c1c(*)cccc1

Although in theory you can use any aromaticity definition in SMARTS, it's best to use Daylight's definition. This ensures portability between implementations and is the default in CDK, am guessing AMBIT's SMARTS? You can work around it but it's a pain. Separate this out would decouple it from the FP internals and still be correct.

@joergkurtwegner circular fingerprints do better on benchmarks than path based fingerprints precisely because they distinguish branches. In fact you can encode heavy degree in a path based fingerprint it will perform similarly to circular ones!

Side note: OEChem has a nice API for doing this:
OEFPAtomType - notice the defaults for paths. If you want paths and branches then a tree based fingerprint should be used, we don't yet have that available.

Given the more precise SMARTS it can be transformed to the more loose form but not vice versa. In it's current implementation the SMARTS will actually distinguish branching: e.g. CC(C)C.

johnmay · 2016-08-19T09:25:56Z

Here's the SMARTS if you use daylight atom:

n(:a):a
c(:a):a
c(:a):a
c(:a)(:a):a
c(:a)(:a):a
c(:a):a
c(:a):a
c(:a):a
c(:a):a
n(c:a)c(:a):a
n(c=c:a):a
c(:a)=cc(:a):a
c(:a)c(c(:a):a)c:a
n(:a)c(c(:a):a)c:a
c(:a)(:a)cc:a
c(:a)cc:a
c(:a)cc:a
c(:a)(:a)cc:a
n1c=cc(c1c:a):a
n1c=cc(c1:a):a
n1c=cc(c1:a)c:a
n1c=cc(c1c:a)cc:a
n1c=cc(c1cc:a)c:a
n(:a)c(c(:a):a)ccc:a
c(:a)(:a)cccc:a
c(:a)(:a)cccc:a
c(:a)c(c(:a):a)ccc:a
n1c=cc(c1c:a)c:a
n1c=cc2c1cccc2
n(:a)c1c(:a)cccc1
c(:a)c1c(:a)cccc1

vedina · 2016-08-19T09:31:21Z

@johnmay the use case is that models are typically built directly with (ECFP) fingerprints, and SMARTS are for matching molecules after the models identify the "best" fingerprints (i.e. making the models interpretable).

Regarding internal structures - neither me nor @ntk73 are original authors of the CircularFingerprint , so we just used what's there. Indeed the CircularFingerprint authors decided to use their own atom typing and aromaticity detection. I agree if we generate SMARTS with one type of aromaticity detection (CircularFP) and use for matching in another one (Daylight, CDK default, etc) it will be inconsistent in some cases, but think for the sake of simplicity I would prefer to leave as it is. As already said, it may even benefit the model generalization (certain level of noise in descriptors is helpful in theory and practice).

(this thread aside, I agree one can get comparable performance with path fingerprints if encoding branching :) )

johnmay · 2016-08-19T10:14:10Z

Then using the molecular rather than the fingerprint aromaticity is preferable I think?

vedina · 2016-08-19T10:18:27Z

@johnmay yes, though this means we'll have to ignore the CircularFingerprint internal aromaticity and match the atoms in the fingerprint back to the original molecule to use the CDK aromaticity flags. @ntk73 do you think it's possible ?

johnmay · 2016-08-19T10:58:19Z

I made the modification to print the output listed above. This also means it no longer needs to be in the same class.

vedina · 2016-08-19T11:13:16Z

John, do I understand right you already have the modification to use the CDK aromaticity when printing the SMARTS ( could you point to the code) ? May be it's best to have both as different options? Otherwise, fine to have the printing in a separate class.

(having in mind that SMARTS may be in principle visualised by arbitrary tools, not necessarily CDK, things might get even more inconsistent than discussed here)

johnmay · 2016-08-19T11:27:22Z

If you use the SMARTSPattern class it will use Daylight aromaticity. If you use the Pattern class with an IQueryAtomContainer it will use the aromaticity as defined on the molecule. You will also need to set the SMARTS invariants. The difficult here is the aromaticity model is locked inside the fingerprint and I don't want to expose that. You can mimmic it with this but there's no guarantee it will match exactly.

Do not use this! It is not correct and does not match exactly

 SmilesParser smipar = new SmilesParser(SilentChemObjectBuilder.getInstance());

        IAtomContainer mol = smipar.parseSmiles("[nH]1ccc2c1cccc2");

        // XXX! doesn't not match the semantics of CircularFingerprint aromticity model
        Aromaticity arom = new Aromaticity(ElectronDonation.piBonds(),
                                           Cycles.all(6));

        // set up SMARTS invariants
        SmartsMatchers.prepare(mol, true);

        Pattern ptrn = Pattern.findSubstructure(SMARTSParser.parse("N1C=Cc(c1c:a):a", null));
        System.err.println(ptrn.matches(mol));
        arom.apply(mol);
        System.err.println(ptrn.matches(mol));

I really thing the correct option is just to write out the aromaticity flags on the mol and not in the fingerprint. Otherwise it's somewhat of a house of cards.

vedina · 2016-08-19T11:31:27Z

@johnmay - I am probably missing something, but how one can use either the SmartsPattern or Pattern class on the internal CircularFingerprint structures and not on the IAtomContainer ? @ntk73 just confirmed he is NOT using the molecule to write the SMARTS, but the internal CircularFingerprint arrays.

johnmay · 2016-08-19T11:36:44Z

Sorry I realised I didn't answer the question - that example was if in the current form what you'd have to do to use the SMARTS produced. Here's the changes:

in nodeToString(int atom)

--- if (bondArom[neighborBo])
+++ if (mol.getBond(neighborBo).isArom())

in getAtomSmarts()

--- if (atomArom[atNum])
+++ if (mol.getAtom(atNum).isArom())

ntk73 · 2016-08-19T11:49:11Z

John, I think that it is possible (although rare) to have some side effects from you last changes. We intentionally inclined to use the locally defined aromaticity within SMARTS generation because it is used for the fragments identification as well.
Now it might happen that two different fragments identified by the CircularFingerprint get the same SMARTS because the different aromaticity detection.

vedina · 2016-08-19T11:54:00Z

... or to have the same fingerprint associated with different SMARTS for different molecules.

(but as said noisy descriptors are not a big deal. and we are not going to solve the different chemistry across all toolkits anyway, so let's have what's more appropriate from the CDK point of view. )

johnmay · 2016-08-19T11:54:55Z

@ntk73 Indeed but I think this is actually what you want. Fingerprints can set multiple bits for the same parts, Daylight for example set a variable number depending on the size of the fragment, RDKIt sets a fixed number. It's actually kind of a bug that CDK doesn't do it in the path based fingerprint. Andrew Dalke has a write up somewhere on this...

egonw · 2016-08-21T07:39:49Z

Hi all, sorry for being late to the party... when we pulled in the CircularFingerprint, they expressed a very strong wish to not swap internal algorithms, so that their validated implementation matches exactly the upstream implementation. Changing SMARTS, the aromaticity models, etc, clearly would change the calculated results, and making it break from the expected (upstream) results... when they pushed, I indicated to prefer it would reuse existing algorithms from the CDK library for various properties, but they insisted on using their own for the above reasons.

Please make sure to include Alex Clark in this discussion!

vedina · 2016-08-21T10:28:37Z

OK, @aclarkxyz , please comment (I hope it's the right github account).

We are not changing the aromaticity model (or anything else) of generating fingerprints, just assigning SMARTS for each fingerprint for interpretability.

I am afraid if we want the generated SMARTS to follow the internal CDKFingerprinter view of the chemistry, we'll need to provide a separate SMARTS parser and subgraph isomorphism matching at the worst case. So what John proposed is a compromise.

aclarkxyz · 2016-08-21T12:38:03Z

What @egonw said. It's absolutely critical that the CircularFingerprint results never change: if they start giving different results, then the entire value proposition goes down the drain. Because they don't make use of any potentially evolving library algorithms like the general purpose aromaticity detector code, and means you don't have any additional restrictions on what you can or can't do with them.

vedina · 2016-08-21T13:22:57Z

@aclarkxyz this pull request implements an extension, which does not change the CircularFingerprint, but adds a function to generate SMARTS per fingerprint. Could you please comment on this? If it's not OK to add the function to this class, we may move the extension into a new subclass.

aclarkxyz · 2016-08-21T14:19:40Z

Wouldn't it be a much better idea to put the SMARTS features into a different file? The circular fingerprinter implementation is already fairly long, is tightly self-contained, and does exactly one thing (and meticulously carefully, might I add). Adding a bunch of peripheral functionality to it doesn't seem like the best call, especially since there's no way to make sure that the additions are actually describing the same concepts ("close enough" is as close as you will ever get, so I don't think it qualifies as core content).

johnmay · 2016-08-21T15:23:38Z

Nina/Nikolay are you happy for me move the code to a separate class and use the models aromaticity flags: CircularFingerprintSmarts okay?

aclarkxyz · 2016-08-21T15:27:29Z

Unless I'm missing something from only having skim-read the code, what you have is basically {molecule, atom indices} -> {SMARTS string}, with a side-note that the output should try to be compatible with the CircularFingerprinter's aromaticity model, if possible (which it isn't, so I wouldn't worry too much about it). That sounds like a very generic algorithm that doesn't necessarily need to be associated with this particular fingerprint at all, except by historical coincidence.

johnmay · 2016-08-21T15:45:42Z

@aclarkxyz kind of... locking in valence, connectivity, and charge as discussed above, it could be general. However it it's current implementation it is specific to circular fingerprints as includes the next iterations neighbour information.

For example for benzene the iteration (ECFP0) will have a single atom index but SMARTS will have three atoms.

c(:a):a d=0
c(c:a)c:a d=2
c(cc:a)cc:a d=4
c1ccccc1 d=6

vedina · 2016-08-22T05:51:15Z

In principle SMARTS generation is a generic algorithm, but in this case it's indeed quite locked to the internal representation.
@johnmay fine to move to external class if you can disentangle the generation code from the internal structures it uses ...
@johnmay , @aclarkxyz - OK to use either of the aromaticity models for SMARTS generation. It's a compromise; neither of the options are perfect, and an ideal solution needs reconciliation of the aromaticity models, which is not going to happen either way.

Otherwise it would be nice to have a common interface for such substructure generating functionality for other fingerprints as well, I'm happy to help if we go this way.

johnmay · 2016-08-22T09:22:34Z

Already extracted Nina, but I'm now wondering with Alex's comment if this is a bit specific and is better kept down stream, JCompoundMapper/AMBIT? I can integrate the SMARTS generation parts which would simplify the method but since the API in the CircularFingerprint already exposes which atom indices are encoded it's a simple transformation from that information to the SMARTs.

It's difficult to judge whether this is library functionality or application specific.

Thoughts?

vedina · 2016-08-22T13:22:18Z

Well, I don't think it specific, it's core functionality of a fingerprint, to be able to see the substructure of each fingerprint (I admit this is not always possible). I definitely think this is NOT to be handled downstream. JCompoundMapper has own implementation of both circular fingerprint and SMARTS generation, so it is not at all interested in such functionality. I would say it's vice versa, theoretically one can port the many JCompoundMapper fingerprints into CDK and ensure they all have the "interpretability" feature.
Of course if CDK doesn't want to have such functionality, we will consider moving it into AMBIT, but I thought it would be beneficial functionality of a core library.

Otherwise we have few other fingerprints in mind where we would like to make them "interpretable" and I was initially looking for implementing a common interface, but learned no such interface exist at the moment in CDK.

aclarkxyz · 2016-08-22T13:43:52Z

The circular fingerprints are already interpretable: atom indices for each fingerprint means you already have the information you need to do some pretty interesting things (e.g. https://cheminf20.org/2015/08/31/visualisation-of-structure-activity-models-fudging-it-with-a-widget/). That's the kind of low level minimalism you get from the algorithm itself, because it can't be done anywhere else, and doesn't add anything that isn't essential. Converting that into SMARTS is very much an interoperability feature that's not at all part of the core functionality... and there could be dozens of other use cases for it. Seems to me like it should be named accordingly, e.g. "SubfragmentQuery" or somesuch. If other fingerprints could be minimally augmented so that they reveal atom indices, then they could be channelled into the new feature. (The reason for this abstraction would be the the molecule+indices definition is literally correct; whereas shoehorning that into SMARTS is technically incorrect - for this algorithm, at least - and is a high level interpretation with a failure rate that may or may not be acceptable for a given use case... that's a decision to be made elsewhere, not inside the fingerprinter library itself.)

mrwns · 2016-09-11T15:34:18Z

@johnmay , I have slightly modified the SubstructureSmarts extractor to optionally also include information about valency/degree, implicit H count and atom-mapping, so it can be used to extract reaction rules. Let me just find out how I can share it, as I have not used github yet to push code.

…e class. Copies exact functionality ATM.

…bonds, it is redundant.

…to find (start with 'Smarts' prefix). Introduce the MODE_EXACT to specify setting the exact valence. We use int constants rather than enums to avoid adding two classes to the public API.

johnmay · 2016-10-07T08:51:24Z

Sorry for delay - all done now, rebased on master, renamed the class, and added the option (the default) to produce correct SMILES. To match the behaviour discussed in this thread MODE_JCOMPOUNDMAPPER can be set.

… now resolved.

johnmay · 2016-10-07T21:01:28Z

@egonw you okay to do the merge on this one? - I did a lot of modifying in the end.

egonw · 2016-10-08T07:12:49Z

I'll look at it today.

egonw

Only major thing is the missing CLW header in one of the files. Others I leave to the authors to decide on.

egonw · 2016-10-09T07:32:15Z

...fingerprint/src/test/java/org/openscience/cdk/fingerprint/CircularFingerprintSmartsTest.java

@@ -0,0 +1,126 @@
+package org.openscience.cdk.fingerprint;


Please add the copyright header here.

done - missed that on the first time through

egonw · 2016-10-09T07:37:41Z

tool/smarts/src/main/java/org/openscience/cdk/smarts/SmartsFragmentExtractor.java

+ * needs to match the nitrogen.
+ *
+ * <p><b>Basic Usage:</b></p>
+ * <pre>{@code


What is "{@code"? Should we start using that for all JavaDoc? If so, sounds like a nice Junior Job...

It should be already used. Main advantage IIRC is you can write:

<pre>{@code List<IAtom> atoms = new ArrayList<IAtom>(); }</pre>

instead of

<pre> List<IAtom> atoms = new ArrayList<IAtom>(); </pre>

I meant, should we check all code examples and add it where it is not used yet? (not for this PR!)

I think that's the only difference, you can find them with this:

find ~/workspace/github/cdk -name '*.java' -exec grep -H '<' {} \;

/Users/john/workspace/github/cdk/cdk/app/depict/src/test/java/org/openscience/cdk/depict/SvgDrawVisitorTest.java: + " <text x='50.0' y='50.0' fill='#FF0000' text-anchor='middle'>PNG < EPS < SVG</text>\n" org/openscience/cdk/graph/AllPairsShortestPaths.java: * for (int i = 0; i < benzene.getAtomCount(); i++) { org/openscience/cdk/graph/AllPairsShortestPaths.java: * for (int j = i + 1; j < benzene.getAtomCount(); j++) { org/openscience/cdk/graph/PathTools.java: * If number of atoms in or below sphere x<max and number of atoms in or below sphere x+1>max then org/openscience/cdk/graph/TripletShortCycles.java: * ESSSR (fingerprints only store cycles |C| <= org/openscience/cdk/isomorphism/Mappings.java: * Iterable<String> strs = mappings.map(new Function<int[], String>() { org/openscience/cdk/isomorphism/Mappings.java: * for (int i = 0; i < input.length; i++) { org/openscience/cdk/isomorphism/Mappings.java: * for (Map<IAtom,IAtom> map : mappings.toAtomMap()) { org/openscience/cdk/isomorphism/Mappings.java: * for (Map.Entry<IAtom,IAtom> e : map.entrySet()) { org/openscience/cdk/isomorphism/Mappings.java: * for (Map<IBond,IBond> map : mappings.toBondMap()) { org/openscience/cdk/isomorphism/Mappings.java: * for (Map.Entry<IBond,IBond> e : map.entrySet()) { org/openscience/cdk/isomorphism/Mappings.java: * for (Map<IChemObject,IChemObject> map : mappings.toBondMap()) { org/openscience/cdk/isomorphism/Mappings.java: * for (Map.Entry<IChemObject,IChemObject> e : map.entrySet()) { org/openscience/cdk/reaction/ReactionSpecification.java: * expected to be <dictionaryNameSpace>:<entryID>. org/openscience/cdk/qsar/DescriptorSpecification.java: * expected to be <dictionaryNameSpace>:<entryID>. org/openscience/cdk/qsar/DescriptorSpecification.java: * expected to be <dictionaryNameSpace>:<entryID>. org/openscience/cdk/signature/MoleculeSignature.java: * List<Orbit> orbits = moleculeSignature.calculateOrbits(); /Users/john/workspace/github/cdk/cdk/doc/javadoc/source/XMIDoclet.java: * into <UML:Class> elements. /Users/john/workspace/github/cdk/cdk/doc/javadoc/source/XMIDoclet.java: * Method that serializes ClassDoc objects into <UML:Class> elements. /Users/john/workspace/github/cdk/cdk/doc/javadoc/source/XMIDoclet.java: * into <listitem> elements (Umbrello specific)?. /Users/john/workspace/github/cdk/cdk/doc/javadoc/source/XMIDoclet.java: * Method that serializes ClassDoc objects into <listitem> elements org/openscience/cdk/math/qm/GaussiansBasis.java: * S = <phi_i|phi_j> org/openscience/cdk/math/qm/GaussiansBasis.java: * J = <d/dr phi_i | d/dr phi_j> org/openscience/cdk/math/qm/GaussiansBasis.java: * V = <phi_i | 1/r | phi_j> org/openscience/cdk/math/qm/IBasis.java: * Calculate the overlap integral S = <phi_i|phi_j>. org/openscience/cdk/math/qm/IBasis.java: * Calculates the impulse J = -<d/dr chi_i | d/dr chi_j>. org/openscience/cdk/math/qm/IBasis.java: * Calculates the potential V = <chi_i | 1/r | chi_j>. org/openscience/cdk/geometry/alignment/KabschAlignment.java: * for (int i = 0; i < ac1.getAtomCount(); i++) { org/openscience/cdk/io/MDLRXNWriter.java: * > <key> org/openscience/cdk/io/MDLV2000Reader.java: * not currently return field numbers (e.g. DT<n>). org/openscience/cdk/io/ReaderFactory.java: * StringReader stringReader = "<molecule/>"; /Users/john/workspace/github/cdk/cdk/storage/io/src/test/java/org/openscience/cdk/io/cml/JmolTest.java: * <ul><li> <crystal></li></ul> org/openscience/cdk/io/FormatFactory.java: * StringReader stringReader = new StringReader("<molecule/>"); org/openscience/cdk/libio/cml/Convertor.java: * @param useCMLIDs Uses object IDs like 'a1' instead of 'a<hash>'. org/openscience/cdk/normalize/Normalizer.java: * <replace-set> org/openscience/cdk/normalize/Normalizer.java: * <replace>O=N=O</replace> org/openscience/cdk/normalize/Normalizer.java: * <replacement>[O-][N+]=O</replacement> org/openscience/cdk/normalize/Normalizer.java: * </replace-set> org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java: * System with numbers wrapped in tags. Useful for org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java: * System with numbers wrapped in tags and the org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java: * isotope of each Element in tags and the total org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java: * charge of IMolecularFormula in tags. Useful for org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java: * wrapped in tags and the isotope of each Element org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java: * in tags and the total showCharge of IMolecularFormula org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java: * in tags. Useful for displaying formulae in Swing org/openscience/cdk/smiles/smarts/SMARTSQueryTool.java: * for (int i = 0; i < nmatch; i++) { org/openscience/cdk/smiles/smarts/SMARTSQueryTool.java: * by Craig James the <code>h<n></code> SMARTS pattern should not be used. It was included in the Daylight spec org/openscience/cdk/smiles/smarts/SMARTSQueryTool.java: * for backwards compatibility. To match hydrogens, use the <code>H<n></cod> pattern.</li> <li>The wild card net/sf/cdk/tools/bibtex/BibTeXMLFile.java: * Returns an Iterator<BibTeXMLEntry>.

egonw · 2016-10-09T07:38:44Z

tool/smarts/src/main/java/org/openscience/cdk/smarts/SmartsFragmentExtractor.java

+public final class SmartsFragmentExtractor {
+
+    /**
+     * Sets the mode of the extractor to produce SMARTS similar to JCompoundMapper.


Consider adding a @cdk.cite to the JCompoundMapper paper.

+ license header + citation + resync bib with upstream

johnmay mentioned this pull request Aug 18, 2016

extracting the substructure patterns of CircularFingerprint. #224

Closed

ntk73 and others added 20 commits October 6, 2016 21:02

Added new function getFPSmarts()

7a95b98

added function findArrayValue()

f0864b3

implemented functions getChargeSmartsStr() and getAtomSmarts()

d6a0fc5

improved the atom smarts

d354570

improved the atom smarts

f2e3f55

set up for recursive traversing of the fingerprint structure

6515a38

Implemented basic functionality in nodeToString()

6f12927

Implemented handling of 'external' neighbour atoms

58f9899

Fixed a bug in external atom check

2a8e5b1

Implemented handling of ring closures

080f776

Implemented handling of aromatic bonds

1bcd340

code cleanup

de46bbf

Added test case for CircularFingerprint smarts

72c4a65

refactored to junit4

1f4b09e

statis smilesparser for the test

2ae1873

Minor tidy up - no functionally modified.

efef10c

More verbose unit testing with hamcrest matchers.

93922bf

Extract and generalise SMARTS substructure functionality to a separat…

a43a478

…e class. Copies exact functionality ATM.

Change functionality - don't write colon (:) for peripheral aromatic …

adff265

…bonds, it is redundant.

Finalise patch: rename to make it clear what it is doing and is easy …

144f4d7

…to find (start with 'Smarts' prefix). Introduce the MODE_EXACT to specify setting the exact valence. We use int constants rather than enums to avoid adding two classes to the public API.

johnmay force-pushed the patch/smarts-ecfp branch from 59dd8aa to 144f4d7 Compare October 7, 2016 08:49

Forgot other test class - spotted quirk with writing charges which is…

c575ef1

… now resolved.

egonw requested changes Oct 9, 2016

View reviewed changes

Fix reviewer comments.

6889bf8

+ license header + citation + resync bib with upstream

egonw merged commit 02872f8 into master Oct 9, 2016

johnmay deleted the patch/smarts-ecfp branch February 12, 2017 21:44

extracting the substructure patterns of CircularFingerprint. #231

extracting the substructure patterns of CircularFingerprint. #231

Conversation

johnmay commented Aug 18, 2016

johnmay commented Aug 18, 2016

johnmay commented Aug 18, 2016

vedina commented Aug 19, 2016

joergkurtwegner commented Aug 19, 2016

johnmay commented Aug 19, 2016

johnmay commented Aug 19, 2016

vedina commented Aug 19, 2016 • edited

johnmay commented Aug 19, 2016

vedina commented Aug 19, 2016

johnmay commented Aug 19, 2016

vedina commented Aug 19, 2016

johnmay commented Aug 19, 2016

vedina commented Aug 19, 2016

johnmay commented Aug 19, 2016

ntk73 commented Aug 19, 2016 • edited

vedina commented Aug 19, 2016

johnmay commented Aug 19, 2016

egonw commented Aug 21, 2016

vedina commented Aug 21, 2016

aclarkxyz commented Aug 21, 2016

vedina commented Aug 21, 2016

aclarkxyz commented Aug 21, 2016

johnmay commented Aug 21, 2016

aclarkxyz commented Aug 21, 2016

johnmay commented Aug 21, 2016

vedina commented Aug 22, 2016 • edited

johnmay commented Aug 22, 2016

vedina commented Aug 22, 2016 • edited

aclarkxyz commented Aug 22, 2016

mrwns commented Sep 11, 2016

johnmay commented Oct 7, 2016

johnmay commented Oct 7, 2016 • edited

egonw commented Oct 8, 2016

egonw left a comment

Choose a reason for hiding this comment

egonw Oct 9, 2016

Choose a reason for hiding this comment

johnmay Oct 9, 2016

Choose a reason for hiding this comment

egonw Oct 9, 2016

Choose a reason for hiding this comment

johnmay Oct 9, 2016 • edited

Choose a reason for hiding this comment

egonw Oct 9, 2016

Choose a reason for hiding this comment

johnmay Oct 9, 2016

Choose a reason for hiding this comment

egonw Oct 9, 2016

Choose a reason for hiding this comment

vedina commented Aug 19, 2016 •

edited

ntk73 commented Aug 19, 2016 •

edited

vedina commented Aug 22, 2016 •

edited

vedina commented Aug 22, 2016 •

edited

johnmay commented Oct 7, 2016 •

edited

johnmay Oct 9, 2016 •

edited