Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extracting the substructure patterns of CircularFingerprint. #231

Merged
merged 22 commits into from Oct 9, 2016

Conversation

johnmay
Copy link
Member

@johnmay johnmay commented Aug 18, 2016

Clean up of #224

@johnmay
Copy link
Member Author

johnmay commented Aug 18, 2016

Made some minor clean up, some other quick questions. For uncharged atoms it should probably output '+0' for the charge layer. This makes all the atoms 'complex' SMARTS expressions but is more correct. Likewise, a single bond should be output between aromatic atoms if it's not aromatic.

@johnmay
Copy link
Member Author

johnmay commented Aug 18, 2016

For CCC shouldn't the SMARTS produced be:

CC* and C(*)C

are the same?

Explicit SMARTS might capture more the features:

r=0
[CX4v4H3+0]   C*
[CX4v4H2+0]   C(*)*

r=1
[CX4v4H3+0][CX4v4H2+0]  CC* and C(*)C
[CX4v4H3+0][CX4v4H2+0][CX4v4H3+0] CCC

@vedina
Copy link
Contributor

vedina commented Aug 19, 2016

@ntk73 knows better, but we might not have all the info in place (from the internal CircularFingerprints structures) in order to generate SMARTS with some level of details. Regarding explicit SMARTS [CX4v4H3+0] vs C* I am not sure explicit SMARTS is what we want, as this feature is implemented as alternative for JCompoundMapper one and I don't recall SMARTS like CX4v4H3+0] there (to be checked). Could be an option though.

@joergkurtwegner
Copy link

Keeping it simple seems to be easier, though some patterns might be mapped to the wrong molecule, e.g. depending if CCC is a chain or might contain a branch. Then again, since also all methods learning on such patterns would not distinguish, this is what the generalization is about. Thus, the simpler version might just do. If more details are required I would provide an option to put it into the $(environment) of the SMARTS matching routine and make it an optional choice.

@johnmay
Copy link
Member Author

johnmay commented Aug 19, 2016

Okay so a quick question, are these SMARTS meant to be used for building models or the matching molecules?

@vedina I realised at the gym, that the internal structures are actually cause a bit of a problem. Is the intention these patterns are searchable because that's not possible? Alex implemented the class as a standalone and thus uses it's own internal aromaticity model.

Here are the SMARTS for indole:

N(*)*
C(*)=*
C(=*)*
c(*)(:a):a
c(:a)(*):a
c(:a):a
c(:a):a
c(:a):a
c(:a):a
N(C=*)c(:a):a
N(C=C*)*
C(*)=Cc(:a):a
C(=*)c(c(*):a)c:a
N(*)c(c(*):a)c:a
c(:a)(*)cc:a
c(:a)cc:a
c(:a)cc:a
c(*)(:a)cc:a
N1C=Cc(c1c:a):a
N1C=Cc(c1:a):a
N1C=Cc(c1:a)c:a
N1C=Cc(c1c:a)cc:a
N1C=Cc(c1cc:a)c:a
N(*)c(c(*):a)ccc:a
c(:a)(*)cccc:a
c(*)(:a)cccc:a
C(=*)c(c(*):a)ccc:a
N1C=Cc(c1c:a)c:a
N1C=Cc2c1cccc2
N(*)c1c(*)cccc1
C(=*)c1c(*)cccc1

Although in theory you can use any aromaticity definition in SMARTS, it's best to use Daylight's definition. This ensures portability between implementations and is the default in CDK, am guessing AMBIT's SMARTS? You can work around it but it's a pain. Separate this out would decouple it from the FP internals and still be correct.

@joergkurtwegner circular fingerprints do better on benchmarks than path based fingerprints precisely because they distinguish branches. In fact you can encode heavy degree in a path based fingerprint it will perform similarly to circular ones!

Side note: OEChem has a nice API for doing this:
OEFPAtomType - notice the defaults for paths. If you want paths and branches then a tree based fingerprint should be used, we don't yet have that available.

Given the more precise SMARTS it can be transformed to the more loose form but not vice versa. In it's current implementation the SMARTS will actually distinguish branching: e.g. CC(C)C.

@johnmay
Copy link
Member Author

johnmay commented Aug 19, 2016

Here's the SMARTS if you use daylight atom:

n(:a):a
c(:a):a
c(:a):a
c(:a)(:a):a
c(:a)(:a):a
c(:a):a
c(:a):a
c(:a):a
c(:a):a
n(c:a)c(:a):a
n(c=c:a):a
c(:a)=cc(:a):a
c(:a)c(c(:a):a)c:a
n(:a)c(c(:a):a)c:a
c(:a)(:a)cc:a
c(:a)cc:a
c(:a)cc:a
c(:a)(:a)cc:a
n1c=cc(c1c:a):a
n1c=cc(c1:a):a
n1c=cc(c1:a)c:a
n1c=cc(c1c:a)cc:a
n1c=cc(c1cc:a)c:a
n(:a)c(c(:a):a)ccc:a
c(:a)(:a)cccc:a
c(:a)(:a)cccc:a
c(:a)c(c(:a):a)ccc:a
n1c=cc(c1c:a)c:a
n1c=cc2c1cccc2
n(:a)c1c(:a)cccc1
c(:a)c1c(:a)cccc1

@vedina
Copy link
Contributor

vedina commented Aug 19, 2016

@johnmay the use case is that models are typically built directly with (ECFP) fingerprints, and SMARTS are for matching molecules after the models identify the "best" fingerprints (i.e. making the models interpretable).

Regarding internal structures - neither me nor @ntk73 are original authors of the CircularFingerprint , so we just used what's there. Indeed the CircularFingerprint authors decided to use their own atom typing and aromaticity detection. I agree if we generate SMARTS with one type of aromaticity detection (CircularFP) and use for matching in another one (Daylight, CDK default, etc) it will be inconsistent in some cases, but think for the sake of simplicity I would prefer to leave as it is. As already said, it may even benefit the model generalization (certain level of noise in descriptors is helpful in theory and practice).

(this thread aside, I agree one can get comparable performance with path fingerprints if encoding branching :) )

@johnmay
Copy link
Member Author

johnmay commented Aug 19, 2016

Then using the molecular rather than the fingerprint aromaticity is preferable I think?

@vedina
Copy link
Contributor

vedina commented Aug 19, 2016

@johnmay yes, though this means we'll have to ignore the CircularFingerprint internal aromaticity and match the atoms in the fingerprint back to the original molecule to use the CDK aromaticity flags. @ntk73 do you think it's possible ?

@johnmay
Copy link
Member Author

johnmay commented Aug 19, 2016

I made the modification to print the output listed above. This also means it no longer needs to be in the same class.

@vedina
Copy link
Contributor

vedina commented Aug 19, 2016

John, do I understand right you already have the modification to use the CDK aromaticity when printing the SMARTS ( could you point to the code) ? May be it's best to have both as different options? Otherwise, fine to have the printing in a separate class.

(having in mind that SMARTS may be in principle visualised by arbitrary tools, not necessarily CDK, things might get even more inconsistent than discussed here)

@johnmay
Copy link
Member Author

johnmay commented Aug 19, 2016

If you use the SMARTSPattern class it will use Daylight aromaticity. If you use the Pattern class with an IQueryAtomContainer it will use the aromaticity as defined on the molecule. You will also need to set the SMARTS invariants. The difficult here is the aromaticity model is locked inside the fingerprint and I don't want to expose that. You can mimmic it with this but there's no guarantee it will match exactly.

Do not use this! It is not correct and does not match exactly

 SmilesParser smipar = new SmilesParser(SilentChemObjectBuilder.getInstance());

        IAtomContainer mol = smipar.parseSmiles("[nH]1ccc2c1cccc2");

        // XXX! doesn't not match the semantics of CircularFingerprint aromticity model
        Aromaticity arom = new Aromaticity(ElectronDonation.piBonds(),
                                           Cycles.all(6));

        // set up SMARTS invariants
        SmartsMatchers.prepare(mol, true);

        Pattern ptrn = Pattern.findSubstructure(SMARTSParser.parse("N1C=Cc(c1c:a):a", null));
        System.err.println(ptrn.matches(mol));
        arom.apply(mol);
        System.err.println(ptrn.matches(mol));

I really thing the correct option is just to write out the aromaticity flags on the mol and not in the fingerprint. Otherwise it's somewhat of a house of cards.

@vedina
Copy link
Contributor

vedina commented Aug 19, 2016

@johnmay - I am probably missing something, but how one can use either the SmartsPattern or Pattern class on the internal CircularFingerprint structures and not on the IAtomContainer ? @ntk73 just confirmed he is NOT using the molecule to write the SMARTS, but the internal CircularFingerprint arrays.

@johnmay
Copy link
Member Author

johnmay commented Aug 19, 2016

Sorry I realised I didn't answer the question - that example was if in the current form what you'd have to do to use the SMARTS produced. Here's the changes:

in nodeToString(int atom)

--- if (bondArom[neighborBo])
+++ if (mol.getBond(neighborBo).isArom())

in getAtomSmarts()

--- if (atomArom[atNum])
+++ if (mol.getAtom(atNum).isArom())

@ntk73
Copy link
Contributor

ntk73 commented Aug 19, 2016

John, I think that it is possible (although rare) to have some side effects from you last changes. We intentionally inclined to use the locally defined aromaticity within SMARTS generation because it is used for the fragments identification as well.
Now it might happen that two different fragments identified by the CircularFingerprint get the same SMARTS because the different aromaticity detection.

@vedina
Copy link
Contributor

vedina commented Aug 19, 2016

... or to have the same fingerprint associated with different SMARTS for different molecules.

(but as said noisy descriptors are not a big deal. and we are not going to solve the different chemistry across all toolkits anyway, so let's have what's more appropriate from the CDK point of view. )

@johnmay
Copy link
Member Author

johnmay commented Aug 19, 2016

@ntk73 Indeed but I think this is actually what you want. Fingerprints can set multiple bits for the same parts, Daylight for example set a variable number depending on the size of the fragment, RDKIt sets a fixed number. It's actually kind of a bug that CDK doesn't do it in the path based fingerprint. Andrew Dalke has a write up somewhere on this...

@egonw
Copy link
Member

egonw commented Aug 21, 2016

Hi all, sorry for being late to the party... when we pulled in the CircularFingerprint, they expressed a very strong wish to not swap internal algorithms, so that their validated implementation matches exactly the upstream implementation. Changing SMARTS, the aromaticity models, etc, clearly would change the calculated results, and making it break from the expected (upstream) results... when they pushed, I indicated to prefer it would reuse existing algorithms from the CDK library for various properties, but they insisted on using their own for the above reasons.

Please make sure to include Alex Clark in this discussion!

@vedina
Copy link
Contributor

vedina commented Aug 21, 2016

OK, @aclarkxyz , please comment (I hope it's the right github account).

We are not changing the aromaticity model (or anything else) of generating fingerprints, just assigning SMARTS for each fingerprint for interpretability.

I am afraid if we want the generated SMARTS to follow the internal CDKFingerprinter view of the chemistry, we'll need to provide a separate SMARTS parser and subgraph isomorphism matching at the worst case. So what John proposed is a compromise.

@aclarkxyz
Copy link
Contributor

What @egonw said. It's absolutely critical that the CircularFingerprint results never change: if they start giving different results, then the entire value proposition goes down the drain. Because they don't make use of any potentially evolving library algorithms like the general purpose aromaticity detector code, and means you don't have any additional restrictions on what you can or can't do with them.

@vedina
Copy link
Contributor

vedina commented Aug 21, 2016

@aclarkxyz this pull request implements an extension, which does not change the CircularFingerprint, but adds a function to generate SMARTS per fingerprint. Could you please comment on this? If it's not OK to add the function to this class, we may move the extension into a new subclass.

@aclarkxyz
Copy link
Contributor

Wouldn't it be a much better idea to put the SMARTS features into a different file? The circular fingerprinter implementation is already fairly long, is tightly self-contained, and does exactly one thing (and meticulously carefully, might I add). Adding a bunch of peripheral functionality to it doesn't seem like the best call, especially since there's no way to make sure that the additions are actually describing the same concepts ("close enough" is as close as you will ever get, so I don't think it qualifies as core content).

@johnmay
Copy link
Member Author

johnmay commented Aug 21, 2016

Nina/Nikolay are you happy for me move the code to a separate class and use the models aromaticity flags: CircularFingerprintSmarts okay?

@aclarkxyz
Copy link
Contributor

Unless I'm missing something from only having skim-read the code, what you have is basically {molecule, atom indices} -> {SMARTS string}, with a side-note that the output should try to be compatible with the CircularFingerprinter's aromaticity model, if possible (which it isn't, so I wouldn't worry too much about it). That sounds like a very generic algorithm that doesn't necessarily need to be associated with this particular fingerprint at all, except by historical coincidence.

@johnmay
Copy link
Member Author

johnmay commented Aug 21, 2016

@aclarkxyz kind of... locking in valence, connectivity, and charge as discussed above, it could be general. However it it's current implementation it is specific to circular fingerprints as includes the next iterations neighbour information.

For example for benzene the iteration (ECFP0) will have a single atom index but SMARTS will have three atoms.

c(:a):a d=0
c(c:a)c:a d=2
c(cc:a)cc:a d=4
c1ccccc1 d=6

@vedina
Copy link
Contributor

vedina commented Aug 22, 2016

In principle SMARTS generation is a generic algorithm, but in this case it's indeed quite locked to the internal representation.
@johnmay fine to move to external class if you can disentangle the generation code from the internal structures it uses ...
@johnmay , @aclarkxyz - OK to use either of the aromaticity models for SMARTS generation. It's a compromise; neither of the options are perfect, and an ideal solution needs reconciliation of the aromaticity models, which is not going to happen either way.

Otherwise it would be nice to have a common interface for such substructure generating functionality for other fingerprints as well, I'm happy to help if we go this way.

@johnmay
Copy link
Member Author

johnmay commented Aug 22, 2016

Already extracted Nina, but I'm now wondering with Alex's comment if this is a bit specific and is better kept down stream, JCompoundMapper/AMBIT? I can integrate the SMARTS generation parts which would simplify the method but since the API in the CircularFingerprint already exposes which atom indices are encoded it's a simple transformation from that information to the SMARTs.

It's difficult to judge whether this is library functionality or application specific.

Thoughts?

@vedina
Copy link
Contributor

vedina commented Aug 22, 2016

Well, I don't think it specific, it's core functionality of a fingerprint, to be able to see the substructure of each fingerprint (I admit this is not always possible). I definitely think this is NOT to be handled downstream. JCompoundMapper has own implementation of both circular fingerprint and SMARTS generation, so it is not at all interested in such functionality. I would say it's vice versa, theoretically one can port the many JCompoundMapper fingerprints into CDK and ensure they all have the "interpretability" feature.
Of course if CDK doesn't want to have such functionality, we will consider moving it into AMBIT, but I thought it would be beneficial functionality of a core library.

Otherwise we have few other fingerprints in mind where we would like to make them "interpretable" and I was initially looking for implementing a common interface, but learned no such interface exist at the moment in CDK.

@aclarkxyz
Copy link
Contributor

The circular fingerprints are already interpretable: atom indices for each fingerprint means you already have the information you need to do some pretty interesting things (e.g. https://cheminf20.org/2015/08/31/visualisation-of-structure-activity-models-fudging-it-with-a-widget/). That's the kind of low level minimalism you get from the algorithm itself, because it can't be done anywhere else, and doesn't add anything that isn't essential. Converting that into SMARTS is very much an interoperability feature that's not at all part of the core functionality... and there could be dozens of other use cases for it. Seems to me like it should be named accordingly, e.g. "SubfragmentQuery" or somesuch. If other fingerprints could be minimally augmented so that they reveal atom indices, then they could be channelled into the new feature. (The reason for this abstraction would be the the molecule+indices definition is literally correct; whereas shoehorning that into SMARTS is technically incorrect - for this algorithm, at least - and is a high level interpretation with a failure rate that may or may not be acceptable for a given use case... that's a decision to be made elsewhere, not inside the fingerprinter library itself.)

@mrwns
Copy link

mrwns commented Sep 11, 2016

@johnmay , I have slightly modified the SubstructureSmarts extractor to optionally also include information about valency/degree, implicit H count and atom-mapping, so it can be used to extract reaction rules. Let me just find out how I can share it, as I have not used github yet to push code.

@johnmay
Copy link
Member Author

johnmay commented Oct 7, 2016

Sorry for delay - all done now, rebased on master, renamed the class, and added the option (the default) to produce correct SMILES. To match the behaviour discussed in this thread MODE_JCOMPOUNDMAPPER can be set.

@johnmay
Copy link
Member Author

johnmay commented Oct 7, 2016

@egonw you okay to do the merge on this one? - I did a lot of modifying in the end.

@egonw
Copy link
Member

egonw commented Oct 8, 2016

I'll look at it today.

Copy link
Member

@egonw egonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only major thing is the missing CLW header in one of the files. Others I leave to the authors to decide on.

@@ -0,0 +1,126 @@
package org.openscience.cdk.fingerprint;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the copyright header here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - missed that on the first time through

* needs to match the nitrogen.
*
* <p><b>Basic Usage:</b></p>
* <pre>{@code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "{@code"? Should we start using that for all JavaDoc? If so, sounds like a nice Junior Job...

Copy link
Member Author

@johnmay johnmay Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be already used. Main advantage IIRC is you can write:

<pre>{@code
List<IAtom> atoms = new ArrayList<IAtom>();
}</pre>

instead of

<pre>
List&lt;IAtom&gt; atoms = new ArrayList&lt;IAtom&gt;();
</pre>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, should we check all code examples and add it where it is not used yet? (not for this PR!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the only difference, you can find them with this:

find ~/workspace/github/cdk -name '*.java' -exec grep -H '&lt;' {} \;
/Users/john/workspace/github/cdk/cdk/app/depict/src/test/java/org/openscience/cdk/depict/SvgDrawVisitorTest.java:                                          + "    <text  x='50.0' y='50.0' fill='#FF0000' text-anchor='middle'>PNG &lt; EPS &lt; SVG</text>\n"
org/openscience/cdk/graph/AllPairsShortestPaths.java: * for (int i = 0; i &lt; benzene.getAtomCount(); i++) {
org/openscience/cdk/graph/AllPairsShortestPaths.java: *     for (int j = i + 1; j &lt; benzene.getAtomCount(); j++) {
org/openscience/cdk/graph/PathTools.java:     * If number of atoms in or below sphere x&lt;max and number of atoms in or below sphere x+1&gt;max then
org/openscience/cdk/graph/TripletShortCycles.java: * ESSSR <br/>(fingerprints only store cycles |C| &lt;=
org/openscience/cdk/isomorphism/Mappings.java:     *     Iterable&lt;String&gt; strs = mappings.map(new Function&lt;int[], String&gt;() {
org/openscience/cdk/isomorphism/Mappings.java:     *             for (int i = 0; i &lt; input.length; i++) {
org/openscience/cdk/isomorphism/Mappings.java:     * for (Map&lt;IAtom,IAtom&gt; map : mappings.toAtomMap()) {
org/openscience/cdk/isomorphism/Mappings.java:     *     for (Map.Entry&lt;IAtom,IAtom&gt; e : map.entrySet()) {
org/openscience/cdk/isomorphism/Mappings.java:     * for (Map&lt;IBond,IBond&gt; map : mappings.toBondMap()) {
org/openscience/cdk/isomorphism/Mappings.java:     *     for (Map.Entry&lt;IBond,IBond&gt; e : map.entrySet()) {
org/openscience/cdk/isomorphism/Mappings.java:     * for (Map&lt;IChemObject,IChemObject&gt; map : mappings.toBondMap()) {
org/openscience/cdk/isomorphism/Mappings.java:     *     for (Map.Entry&lt;IChemObject,IChemObject&gt; e : map.entrySet()) {
org/openscience/cdk/reaction/ReactionSpecification.java:     *          expected to be &lt;dictionaryNameSpace&gt;:&lt;entryID&gt;.
org/openscience/cdk/qsar/DescriptorSpecification.java:     *          expected to be &lt;dictionaryNameSpace&gt;:&lt;entryID&gt;.
org/openscience/cdk/qsar/DescriptorSpecification.java:     *          expected to be &lt;dictionaryNameSpace&gt;:&lt;entryID&gt;.
org/openscience/cdk/signature/MoleculeSignature.java: * List&lt;Orbit&gt; orbits = moleculeSignature.calculateOrbits();
/Users/john/workspace/github/cdk/cdk/doc/javadoc/source/XMIDoclet.java:     * into &lt;UML:Class> elements.
/Users/john/workspace/github/cdk/cdk/doc/javadoc/source/XMIDoclet.java:     * Method that serializes ClassDoc objects into &lt;UML:Class> elements.
/Users/john/workspace/github/cdk/cdk/doc/javadoc/source/XMIDoclet.java:     * into &lt;listitem> elements (Umbrello specific)?.
/Users/john/workspace/github/cdk/cdk/doc/javadoc/source/XMIDoclet.java:     * Method that serializes ClassDoc objects into &lt;listitem> elements
org/openscience/cdk/math/qm/GaussiansBasis.java: * S = &lt;phi_i|phi_j><br>
org/openscience/cdk/math/qm/GaussiansBasis.java: * J = &lt;d/dr phi_i | d/dr phi_j><br>
org/openscience/cdk/math/qm/GaussiansBasis.java: * V = &lt;phi_i | 1/r | phi_j><br>
org/openscience/cdk/math/qm/IBasis.java:     * Calculate the overlap integral S = &lt;phi_i|phi_j>.
org/openscience/cdk/math/qm/IBasis.java:     * Calculates the impulse J = -&lt;d/dr chi_i | d/dr chi_j>.
org/openscience/cdk/math/qm/IBasis.java:     * Calculates the potential V = &lt;chi_i | 1/r | chi_j>.
org/openscience/cdk/geometry/alignment/KabschAlignment.java: * for (int i = 0; i &lt; ac1.getAtomCount(); i++) {
org/openscience/cdk/io/MDLRXNWriter.java:     * &gt; &lt;key&gt;<br>
org/openscience/cdk/io/MDLV2000Reader.java:     * not currently return field numbers (e.g. DT&lt;n&gt;).
org/openscience/cdk/io/ReaderFactory.java: *   StringReader stringReader = "&lt;molecule/>";
/Users/john/workspace/github/cdk/cdk/storage/io/src/test/java/org/openscience/cdk/io/cml/JmolTest.java:     * <ul><li> &lt;crystal></li></ul>
org/openscience/cdk/io/FormatFactory.java: *   StringReader stringReader = new StringReader("&lt;molecule/>");
org/openscience/cdk/libio/cml/Convertor.java:     * @param useCMLIDs Uses object IDs like 'a1' instead of 'a&lt;hash>'.
org/openscience/cdk/normalize/Normalizer.java:     *  &lt;replace-set&gt;<br>
org/openscience/cdk/normalize/Normalizer.java:     *  &lt;replace&gt;O=N=O&lt;/replace&gt;<br>
org/openscience/cdk/normalize/Normalizer.java:     *  &lt;replacement&gt;[O-][N+]=O&lt;/replacement&gt;<br>
org/openscience/cdk/normalize/Normalizer.java:     *  &lt;/replace-set&gt;<br>
org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java:     * System with numbers wrapped in &lt;sub&gt;&lt;/sub&gt; tags. Useful for
org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java:     * System with numbers wrapped in &lt;sub&gt;&lt;/sub&gt; tags and the
org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java:     * isotope of each Element in &lt;sup&gt;&lt;/sup&gt; tags and the total
org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java:     * charge of IMolecularFormula in &lt;sup&gt;&lt;/sup&gt; tags. Useful for
org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java:     * wrapped in &lt;sub&gt;&lt;/sub&gt; tags and the isotope of each Element
org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java:     * in &lt;sup&gt;&lt;/sup&gt; tags and the total showCharge of IMolecularFormula
org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator.java:     * in &lt;sup&gt;&lt;/sup&gt; tags. Useful for displaying formulae in Swing
org/openscience/cdk/smiles/smarts/SMARTSQueryTool.java: *    for (int i = 0; i &lt; nmatch; i++) {
org/openscience/cdk/smiles/smarts/SMARTSQueryTool.java: * by Craig James the <code>h&lt;n&gt;</code> SMARTS pattern should not be used. It was included in the Daylight spec
org/openscience/cdk/smiles/smarts/SMARTSQueryTool.java: * for backwards compatibility. To match hydrogens, use the <code>H&lt;n&gt;</cod> pattern.</li> <li>The wild card
net/sf/cdk/tools/bibtex/BibTeXMLFile.java:   * Returns an Iterator&lt;BibTeXMLEntry>.

public final class SmartsFragmentExtractor {

/**
* Sets the mode of the extractor to produce SMARTS similar to JCompoundMapper.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a @cdk.cite to the JCompoundMapper paper.

+ license header
+ citation
+ resync bib with upstream
@egonw egonw merged commit 02872f8 into master Oct 9, 2016
@johnmay johnmay deleted the patch/smarts-ecfp branch February 12, 2017 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants