Read Optimised Data Structures #278

johnmay · 2017-03-14T22:44:04Z

IMPORTANT: Proof of concept only (do not apply)

The CDK data structures are write optimised - neither atoms or bonds no about the molecule which they belong and their index. This is efficient for generating structures as atoms and bonds can be re-usued. However this makes traversal painfully inefficient |E|^2 instead of |V|.

In practise object creation is cheap having to re-create atoms/bonds should not be optimised - unfortunately this would be a very destructive change (/rewrite). To avoid the inefficient traversals in the past I introduced and the GraphUtil API. This is still less efficient than just storing the connection table like this but reduces and |E|^2 traversal to an |E|+|V| (|E|=build cost).

This has worked successfully for many algorithms but is clunky to use, accessing bonds was also nasty by a secondary map instance:

IAtomContainer mol = ...; // plain old molecule
GraphUtil.EdgeToBondMap bondsmap = GraphUtil.EdgeToBondMap.withSpaceFor(mol);
int[][] adjlist = GraphUtil.toAdjList(mol, bondsmap);

void dfs(boolean[] mark, int[][] adjlist, int src, int prev) {
  mark[src] = true;
  for (int dst : adjlist[src]) {
    if (dst != prev)
      dfs(mark, adjlist, src, prev);
  }
}

// dfs bond info
void dfs(boolean[] mark, int[][] adjlist, GraphUtil.EdgeToBondMap bondmap, int src, int prev) {
  mark[src] = true;
  for (int dst : adjlist[src]) {
    IBond bond = bondmap.get(src, dst);
    if (dst != prev)
      dfs(mark, adjlist, src, prev);
  }
}

This patch introduces two new class AtomRef and BondRef that reference atoms and bonds in an AtomContainer instance. Each ref stores the index in the container and the adjacency of the bonds. The class implement the IBond and IAtom interfaces and pass all methods through to an underlying implementation. This means you can use the atoms and bonds with existing method.

AtomRef[] arefs = AtomRef.getAtomRefs(mol); // non-final

void dfs(boolean[] mark, AtomRef a, BondRef last) {
  mark[a.getIndex()] = true;
  for (BondRef b : a.getBonds()) {
  if (b != last)
     dfs(mark, a, last);
  }
}

I think I also want to integrate stereo chemistry and may well also provide an AtomContainerRef.

@egonw @rajarshi let me know your thoughts then will put it to mailing list. I know in particularly Till of Scaffold Hunter use to have a fork with faster traversal - this should help him but providing a standard API to do it.

egonw · 2017-03-15T16:57:51Z

I had a go at this 2 or 3 times... at the time I tried doing it an an IAtomContainer level, which generally was a pain, because of the invalidation during making changes to the container... also, and more importantly, it never really gave an significant speed up at computation level... that said, all those approaches did not collapse things first into a connectivity matrix, so perhaps not surprising... the issue was, in these approaches, that the overhead of extra bookkeeping was really less than the win of the better scalability for most data sets I played with... (do you have performance stats?)

So, I'm happy to hear you're having a go at this. Downstream algorithms still need to know if they need to call the AtomRef.getAtomRefs() again, or is that the idea in the first place? Is that fast enough?

rajarshi · 2017-03-15T19:01:49Z

Nice patch. It looks like a good alternative to a full rewrite. I can definitely see the utility in avoiding unnecessary bookkeeping code, and if the refs are keeping track of indices and adjacent bonds, I'm guessing that will improve performance. It'd be interesting to see some benchmarks. But even if there is no improvement (and no degradation) I'd support it on the basis of cleaner code.

Nice work and I support bringing it up on the list.

johnmay · 2017-03-15T19:14:53Z

I tried doing it an an IAtomContainer level, which generally was a pain, because of the invalidation during making changes to the container...

Yes that would be a pain, just to clarify as documented this does no book keeping and is specifically designed to be read only. Writes are v. rare - but do happen - as you say the user needs to know what to call. I would see this as an expert mode for those that need the read access speed.

(do you have performance stats?)

I'll write a ring membership algorithm using one and then the other to show the times.

I'm guessing that will improve performance. It'd be interesting to see some benchmarks. But even if there is no improvement (and no degradation) I'd support it on the basis of cleaner code.

So I think what I do at the moment with the int[][] will still be optimised better - mainly due to memory layout but the win here is much cleaner code. My actual use case them prompted the clean up of how I do it currently was I need to add a custom comparator to the canonicalisation:

Comparator<AtomRef> comp = new Comparator<AtomRef>(){
  int compare(AtomRef a, AtomRef b) {
     // compare bonds etc..
  }
}

rajarshi · 2017-03-15T19:23:22Z

I agree that this would be expert mode. However, I'm a fan of the 'one good way to do it' philosophy to toolkit/language design. Having two modes goes against that, but I can see why it might be necessary. I assume that writes will fail when using these ref objects?

johnmay · 2017-03-15T19:36:11Z

By writes I mean data structure write i.e. adding/deleting bonds. Set attributes is safe and supported.

As Egon says detecting if something goes out of sync is pain - although technically possible. I might investigate, at the least I could have something similar to "concurrent modification exception".

johnmay · 2017-03-15T22:40:33Z

Runnable JAR:
Uploading cdkbmark.jar.zip…

Results (+blog post) to follow:

/*
 * Copyright (c) 2017 John May <jwmay@users.sf.net>
 *
 * Contact: cdk-devel@lists.sourceforge.net
 *
 * This program is free software; you can redistribute it and/or modify it
 * under the terms of the GNU Lesser General Public License as published by
 * the Free Software Foundation; either version 2.1 of the License, or (at
 * your option) any later version. All we ask is that proper credit is given
 * for our work, which includes - but is not limited to - adding the above
 * copyright notice to the beginning of your source code files, and to any
 * copyright notice that you may distribute with programs based on this work.
 *
 * This program is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
 * License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
 */

package org.openscience.cdk;

import joptsimple.OptionParser;
import joptsimple.OptionSet;
import joptsimple.OptionSpec;
import org.openscience.cdk.exception.InvalidSmilesException;
import org.openscience.cdk.graph.GraphUtil;
import org.openscience.cdk.interfaces.IAtom;
import org.openscience.cdk.interfaces.IAtomContainer;
import org.openscience.cdk.interfaces.IBond;
import org.openscience.cdk.silent.SilentChemObjectBuilder;
import org.openscience.cdk.smiles.SmilesParser;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public class Benchmark {

    private static OptionParser        optpar     = new OptionParser();
    private static OptionSpec<Integer> stepSpec   = optpar.accepts("step").withRequiredArg().ofType(Integer.class).defaultsTo(100000);
    private static OptionSpec<String>  filterSpec = optpar.accepts("filter").withRequiredArg().ofType(String.class).defaultsTo(".+");
    private static OptionSpec<File>    filesSpec  = optpar.nonOptions().ofType(File.class);


    private static abstract class MeasureAlgorithm {
        long tInit = 0;
        long tRun  = 0;
        long check = 0;
        String desc;

        public MeasureAlgorithm(String desc) {
            this.desc = desc;
        }

        abstract void process(IAtomContainer mol);
    }

    private static MeasureAlgorithm[] algs = new MeasureAlgorithm[]{
        new MeasureAlgorithm("AtomContainer\tDepthFirst\t") {

            private void dfs(IAtomContainer mol, IAtom src, IBond prev) {
                check++;
                src.setFlag(CDKConstants.VISITED, true);
                for (IBond bond : mol.getConnectedBondsList(src)) {
                    if (bond == prev)
                        continue;
                    IAtom dst = bond.getConnectedAtom(src);
                    if (!dst.getFlag(CDKConstants.VISITED))
                        dfs(mol, dst, bond);
                }
            }

            @Override
            void process(IAtomContainer mol) {
                long t0 = System.nanoTime();
                for (IAtom atom : mol.atoms())
                    atom.setFlag(CDKConstants.VISITED, false);
                for (IAtom atom : mol.atoms())
                    if (!atom.getFlag(CDKConstants.VISITED))
                        dfs(mol, atom, null);
                long t1 = System.nanoTime();
                tRun += t1 - t0;
            }
        },
        new MeasureAlgorithm("int[][]\tDepthFirst\tVisitFlag") {

            private int dfs(IAtomContainer mol, int[][] adjlist, int src, int prev) {
                check++;
                IAtom atm = mol.getAtom(src);
                atm.setFlag(CDKConstants.VISITED, true);
                for (int dst : adjlist[src]) {
                    if (dst == prev)
                        continue;
                    IAtom nbr = mol.getAtom(dst);
                    if (!nbr.getFlag(CDKConstants.VISITED))
                        dfs(mol, adjlist, dst, src);
                }
                return 0;
            }

            @Override
            void process(IAtomContainer mol) {
                long                    t0      = System.nanoTime();
                GraphUtil.EdgeToBondMap bmap    = GraphUtil.EdgeToBondMap.withSpaceFor(mol);
                int[][]                 adjlist = GraphUtil.toAdjList(mol, bmap);
                long                    t1      = System.nanoTime();
                for (IAtom atom : mol.atoms())
                    atom.setFlag(CDKConstants.VISITED, false);
                for (int i = 0; i < mol.getAtomCount(); i++) {
                    IAtom atom = mol.getAtom(i);
                    if (!atom.getFlag(CDKConstants.VISITED))
                        dfs(mol, adjlist, i, i);
                }
                long t2 = System.nanoTime();
                tInit += t1 - t0;
                tRun += t2 - t1;
            }
        },
        new MeasureAlgorithm("int[][]\tDepthFirst\tVisitArray") {

            private int dfs(IAtomContainer mol, int[][] adjlist, int src, int prev, boolean[] visit) {
                check++;
                visit[src] = true;
                for (int dst : adjlist[src]) {
                    if (dst == prev)
                        continue;
                    if (!visit[dst])
                        dfs(mol, adjlist, dst, src, visit);
                }
                return 0;
            }

            @Override
            void process(IAtomContainer mol) {
                long                    t0      = System.nanoTime();
                GraphUtil.EdgeToBondMap bmap    = GraphUtil.EdgeToBondMap.withSpaceFor(mol);
                int[][]                 adjlist = GraphUtil.toAdjList(mol, bmap);
                long                    t1      = System.nanoTime();
                boolean[]               visit   = new boolean[mol.getAtomCount()];
                for (int i = 0; i < mol.getAtomCount(); i++) {
                    if (!visit[i]) dfs(mol, adjlist, i, i, visit);
                }
                long t2 = System.nanoTime();
                tInit += t1 - t0;
                tRun += t2 - t1;
            }
        },
        new MeasureAlgorithm("AtomRef\tDepthFirst\tVisitFlag") {

            private final int dfs(AtomRef src, BondRef prev) {
                check++;
                src.setFlag(CDKConstants.VISITED, true);
                for (BondRef bond : src.getBonds()) {
                    if (bond == prev)
                        continue;
                    AtomRef dst = bond.getConnectedAtom(src);
                    if (!dst.getFlag(CDKConstants.VISITED))
                        dfs(dst, bond);
                }
                return 0;
            }

            @Override
            void process(IAtomContainer mol) {
                long      t0    = System.nanoTime();
                AtomRef[] arefs = AtomRef.getAtomRefs(mol);
                long      t1    = System.nanoTime();
                for (IAtom atom : mol.atoms())
                    atom.setFlag(CDKConstants.VISITED, false);
                for (AtomRef aref : arefs)
                    if (!aref.getFlag(CDKConstants.VISITED))
                        dfs(aref, null);
                long t2 = System.nanoTime();
                tInit += t1 - t0;
                tRun += t2 - t1;
            }
        },
        new MeasureAlgorithm("AtomRef\tDepthFirst\tVisitArray") {

            private int dfs(AtomRef src, BondRef prev, boolean[] visit) {
                check++;
                visit[src.getIndex()] = true;
                for (BondRef bond : src.getBonds()) {
                    if (bond == prev)
                        continue;
                    AtomRef dst = bond.getConnectedAtom(src);
                    if (!visit[dst.getIndex()])
                        dfs(dst, bond, visit);
                }
                return 0;
            }

            @Override
            void process(IAtomContainer mol) {
                long      t0    = System.nanoTime();
                AtomRef[] arefs = AtomRef.getAtomRefs(mol);
                long      t1    = System.nanoTime();
                boolean[] visit = new boolean[mol.getAtomCount()];
                for (AtomRef aref : arefs)
                    if (!visit[aref.getIndex()])
                        dfs(aref, null, visit);
                long t2 = System.nanoTime();
                tInit += t1 - t0;
                tRun += t2 - t1;
            }
        },
        new MeasureAlgorithm("AtomContainer\tRelaxation\t") {

            @Override
            void process(IAtomContainer mol) {
                long  t0   = System.nanoTime();
                int[] prev = new int[mol.getAtomCount()];
                int[] next = new int[mol.getAtomCount()];
                for (int i = 0; i < mol.getAtomCount(); i++) {
                    next[i] = prev[i] = mol.getAtom(i).getAtomicNumber();
                }
                for (int rep = 0; rep < mol.getAtomCount(); rep++) {
                    for (int j = 0; j < mol.getAtomCount(); j++) {
                        IAtom atom = mol.getAtom(j);
                        for (IBond bond : mol.getConnectedBondsList(atom)) {
                            IAtom nbr = bond.getConnectedAtom(atom);
                            next[j] += prev[mol.getAtomNumber(nbr)];
                        }
                    }
                    System.arraycopy(next, 0, prev, 0, next.length);
                }
                long t1 = System.nanoTime();
                tRun += t1 - t0;
                for (int aNext : next) check += aNext;
            }
        },
        new MeasureAlgorithm("AtomContainer\tRelaxation\tImproved") {

            @Override
            void process(IAtomContainer mol) {
                long  t0   = System.nanoTime();
                int[] prev = new int[mol.getAtomCount()];
                int[] next = new int[mol.getAtomCount()];
                for (int i = 0; i < mol.getAtomCount(); i++) {
                    next[i] = prev[i] = mol.getAtom(i).getAtomicNumber();
                }
                for (int rep = 0; rep < mol.getAtomCount(); rep++) {
                    for (IBond bond : mol.bonds()) {
                        IAtom beg    = bond.getAtom(0);
                        IAtom end    = bond.getAtom(1);
                        int   begIdx = mol.getAtomNumber(beg);
                        int   endIdx = mol.getAtomNumber(end);
                        next[begIdx] += prev[endIdx];
                        next[endIdx] += prev[begIdx];
                    }
                    System.arraycopy(next, 0, prev, 0, next.length);
                }
                long t1 = System.nanoTime();
                tRun += t1 - t0;
                for (int aNext : next) check += aNext;
            }
        },
        new MeasureAlgorithm("AtomRef\tRelaxation\t") {

            @Override
            void process(IAtomContainer mol) {
                long      t0    = System.nanoTime();
                AtomRef[] arefs = AtomRef.getAtomRefs(mol);
                long      t1    = System.nanoTime();
                int[]     prev  = new int[mol.getAtomCount()];
                int[]     next  = new int[mol.getAtomCount()];
                for (int i = 0; i < mol.getAtomCount(); i++) {
                    next[i] = prev[i] = mol.getAtom(i).getAtomicNumber();
                }
                for (int rep = 0; rep < mol.getAtomCount(); rep++) {
                    for (AtomRef aref : arefs) {
                        int idx = aref.getIndex();
                        for (BondRef bond : aref.getBonds()) {
                            next[idx] += prev[bond.getConnectedAtom(aref).getIndex()];
                        }
                    }
                    System.arraycopy(next, 0, prev, 0, next.length);
                }
                long t2 = System.nanoTime();
                tInit += t1 - t0;
                tRun += t2 - t1;
                for (int aNext : next) check += aNext;
            }
        },
        new MeasureAlgorithm("BondRef\tRelaxation\t") {

            @Override
            void process(IAtomContainer mol) {
                long      t0    = System.nanoTime();
                BondRef[] bonds = BondRef.getBondRefs(mol);
                long      t1    = System.nanoTime();
                int[]     prev  = new int[mol.getAtomCount()];
                int[]     next  = new int[mol.getAtomCount()];
                for (int i = 0; i < mol.getAtomCount(); i++) {
                    next[i] = prev[i] = mol.getAtom(i).getAtomicNumber();
                }
                for (int rep = 0; rep < mol.getAtomCount(); rep++) {
                    for (BondRef bond : bonds) {
                        int begIdx = bond.getAtom(0).getIndex();
                        int endIdx = bond.getAtom(1).getIndex();
                        next[begIdx] += prev[endIdx];
                        next[endIdx] += prev[begIdx];
                    }
                    System.arraycopy(next, 0, prev, 0, next.length);
                }
                long t2 = System.nanoTime();
                tInit += t1 - t0;
                tRun += t2 - t1;
                for (int aNext : next) check += aNext;
            }
        }
    };


    public static void main(String[] args) {

        long t0 = System.nanoTime();

        OptionSet optset = optpar.parse(args);

        int    step   = optset.valueOf(stepSpec);
        String filter = optset.valueOf(filterSpec);

        int     pos     = 0;
        int     num     = 0;
        Pattern pattern = Pattern.compile(filter);
        while (pos < algs.length) {
            if (pattern.matcher(algs[pos].desc).find()) {
                algs[num++] = algs[pos];
            }
            pos++;
        }
        algs = Arrays.copyOf(algs, num);
        System.err.println("Filter: " + filter);

        System.err.printf("%20s\t%20s\t%-35s\t%15s\t%15s\t%15s\t%15s\t%15s\t%s\n",
                          "DataSet",
                          "DataRange",
                          "DataStruct\tAlgorithm\tTweak",
                          "tElap",
                          "tElapSlower",
                          "tInit",
                          "tRun",
                          "tRunSlower",
                          "CheckSum");

        for (File file : filesSpec.values(optset))
            try (InputStream in = new FileInputStream(args[0]);
                 Reader rdr = new InputStreamReader(in, StandardCharsets.UTF_8);
                 BufferedReader brdr = new BufferedReader(rdr)) {

                SmilesParser smipar = new SmilesParser(SilentChemObjectBuilder.getInstance());
                smipar.kekulise(false); // not needed here

                int count   = 0;
                int linenum = 0;

                String line;
                while ((line = brdr.readLine()) != null) {
                    ++count;
                    ++linenum;

                    if (count == step) {
                        report(file, count, linenum);
                        count=0;
                    }
                    try {
                        IAtomContainer mol = smipar.parseSmiles(line);
                        for (MeasureAlgorithm alg : algs) {
                            alg.process(mol);
                        }
                    } catch (InvalidSmilesException e) {
                        System.err.println("SMILES ERROR: " + e.getMessage());
                    }
                }

                report(file, count, linenum);
            } catch (IOException e) {

            }

        long t1 = System.nanoTime();
        System.err.printf("Total Elapsed Time: %.2f s\n", (t1 - t0) / 1e9);
    }

    private static void report(File file, int count, int linenum) {
        long bestElap = Long.MAX_VALUE;
        long bestRun  = Long.MAX_VALUE;
        for (MeasureAlgorithm alg : algs) {
            if (alg.tRun < bestRun)
                bestRun = alg.tRun;
            if (alg.tInit+alg.tRun < bestElap)
                bestElap = alg.tInit+alg.tRun;
        }
        for (MeasureAlgorithm alg : algs) {
            System.err.printf("%20s\t%20s\t%-35s\t%15s\t%15s\t%15s\t%15s\t%15s\t%d\n",
                              file.getName(),
                              String.format("%d-%d", linenum - count, linenum),
                              alg.desc,
                              String.format("%.1f", (alg.tInit + alg.tRun) / 1e6),
                              String.format("%.1fx", (alg.tInit + alg.tRun) / (double) bestElap),
                              String.format("%.1f", alg.tInit / 1e6),
                              String.format("%.1f", alg.tRun / 1e6),
                              String.format("%.1fx", alg.tRun / (double) bestRun),
                              alg.check);
            alg.tInit = 0;
            alg.tRun = 0;
        }
    }
}

rajarshi · 2017-03-15T22:45:31Z

Impressive! Looking forward to the blog post

johnmay · 2017-03-15T22:47:57Z

It's a win over even for trivial algorithms like DFS. In reality much more traversal is happening and since you only need to do the init once it's huge wins for algorithms like canonicalisation, substructure search, layout, etc... The Relaxation algorithm is morgan/canonical labelling.

Summary:
DepthFirst = ~3x faster total elapsed time, ~17x faster processing time.
Relaxation = 76-190x faster total elapsed time, 118-268x faster processing time.

DepthFirstSearch

java -jar cdkbmark.jar /data/chembl_22.smi -filter DepthFirst 2>/dev/null

Scroll Right-->

             DataSet	           DataRange	DataStruct	Algorithm	Tweak         	          tElap	    tElapSlower	          tInit	           tRun	     tRunSlower	CheckSum
       chembl_22.smi	            0-100000	AtomContainer	DepthFirst	          	          686.1	           2.9x	            0.0	          686.1	          16.7x	2959618
       chembl_22.smi	            0-100000	int[][]	DepthFirst	VisitFlag       	          588.8	           2.5x	          533.2	           55.6	           1.4x	2959618
       chembl_22.smi	            0-100000	int[][]	DepthFirst	VisitArray      	          536.5	           2.3x	          495.4	           41.2	           1.0x	2959618
       chembl_22.smi	            0-100000	AtomRef	DepthFirst	VisitFlag       	          354.2	           1.5x	          292.8	           61.4	           1.5x	2959618
       chembl_22.smi	            0-100000	AtomRef	DepthFirst	VisitArray      	          234.2	           1.0x	          177.2	           56.9	           1.4x	2959618
       chembl_22.smi	       100000-200000	AtomContainer	DepthFirst	          	          711.0	           3.1x	            0.0	          711.0	          17.4x	6032900
       chembl_22.smi	       100000-200000	int[][]	DepthFirst	VisitFlag       	          588.6	           2.6x	          536.5	           52.1	           1.3x	6032900
       chembl_22.smi	       100000-200000	int[][]	DepthFirst	VisitArray      	          543.8	           2.4x	          502.9	           40.9	           1.0x	6032900
       chembl_22.smi	       100000-200000	AtomRef	DepthFirst	VisitFlag       	          376.1	           1.6x	          316.7	           59.4	           1.5x	6032900
       chembl_22.smi	       100000-200000	AtomRef	DepthFirst	VisitArray      	          228.6	           1.0x	          172.3	           56.4	           1.4x	6032900
       chembl_22.smi	       200000-300000	AtomContainer	DepthFirst	          	          746.0	           3.2x	            0.0	          746.0	          18.3x	9120188
       chembl_22.smi	       200000-300000	int[][]	DepthFirst	VisitFlag       	          606.0	           2.6x	          554.3	           51.7	           1.3x	9120188
       chembl_22.smi	       200000-300000	int[][]	DepthFirst	VisitArray      	          560.7	           2.4x	          520.0	           40.7	           1.0x	9120188
       chembl_22.smi	       200000-300000	AtomRef	DepthFirst	VisitFlag       	          387.2	           1.7x	          327.8	           59.4	           1.5x	9120188
       chembl_22.smi	       200000-300000	AtomRef	DepthFirst	VisitArray      	          232.0	           1.0x	          175.7	           56.3	           1.4x	9120188
       chembl_22.smi	       300000-400000	AtomContainer	DepthFirst	          	          725.1	           3.1x	            0.0	          725.1	          18.4x	12257398
       chembl_22.smi	       300000-400000	int[][]	DepthFirst	VisitFlag       	          604.4	           2.6x	          554.1	           50.3	           1.3x	12257398
       chembl_22.smi	       300000-400000	int[][]	DepthFirst	VisitArray      	          559.3	           2.4x	          519.9	           39.4	           1.0x	12257398
       chembl_22.smi	       300000-400000	AtomRef	DepthFirst	VisitFlag       	          397.9	           1.7x	          338.8	           59.1	           1.5x	12257398
       chembl_22.smi	       300000-400000	AtomRef	DepthFirst	VisitArray      	          236.4	           1.0x	          180.7	           55.7	           1.4x	12257398
       chembl_22.smi	       400000-500000	AtomContainer	DepthFirst	          	          803.6	           3.3x	            0.0	          803.6	          19.1x	15499484
       chembl_22.smi	       400000-500000	int[][]	DepthFirst	VisitFlag       	          644.5	           2.6x	          592.1	           52.4	           1.2x	15499484
       chembl_22.smi	       400000-500000	int[][]	DepthFirst	VisitArray      	          602.5	           2.4x	          560.5	           42.0	           1.0x	15499484
       chembl_22.smi	       400000-500000	AtomRef	DepthFirst	VisitFlag       	          405.4	           1.6x	          344.3	           61.0	           1.5x	15499484
       chembl_22.smi	       400000-500000	AtomRef	DepthFirst	VisitArray      	          247.2	           1.0x	          189.6	           57.6	           1.4x	15499484
       chembl_22.smi	       500000-600000	AtomContainer	DepthFirst	          	          919.8	           3.8x	            0.0	          919.8	          22.5x	18713238
       chembl_22.smi	       500000-600000	int[][]	DepthFirst	VisitFlag       	          682.0	           2.8x	          630.0	           52.1	           1.3x	18713238
       chembl_22.smi	       500000-600000	int[][]	DepthFirst	VisitArray      	          640.0	           2.6x	          599.1	           40.9	           1.0x	18713238
       chembl_22.smi	       500000-600000	AtomRef	DepthFirst	VisitFlag       	          406.4	           1.7x	          345.5	           60.9	           1.5x	18713238
       chembl_22.smi	       500000-600000	AtomRef	DepthFirst	VisitArray      	          244.6	           1.0x	          187.3	           57.3	           1.4x	18713238

Relaxation

java -jar cdkbmark.jar /data/chembl_22.smi -filter Relaxation 2>/dev/null

Scroll Right-->

             DataSet	           DataRange	DataStruct	Algorithm	Tweak         	          tElap	    tElapSlower	          tInit	           tRun	     tRunSlower	CheckSum
       chembl_22.smi	            0-100000	AtomContainer	Relaxation	          	        41884.5	          76.4x	            0.0	        41884.5	         118.8x	36642796741133
       chembl_22.smi	            0-100000	AtomContainer	Relaxation	Improved  	         2819.6	           5.1x	            0.0	         2819.6	           8.0x	36642796741133
       chembl_22.smi	            0-100000	AtomRef	Relaxation	                	         1371.7	           2.5x	          322.8	         1048.9	           3.0x	36642796741133
       chembl_22.smi	            0-100000	BondRef	Relaxation	                	          548.4	           1.0x	          195.9	          352.5	           1.0x	36642796741133
       chembl_22.smi	       100000-200000	AtomContainer	Relaxation	          	        43119.9	          75.8x	            0.0	        43119.9	         118.2x	69677036804308
       chembl_22.smi	       100000-200000	AtomContainer	Relaxation	Improved  	         2874.2	           5.1x	            0.0	         2874.2	           7.9x	69677036804308
       chembl_22.smi	       100000-200000	AtomRef	Relaxation	                	         1435.3	           2.5x	          335.1	         1100.1	           3.0x	69677036804308
       chembl_22.smi	       100000-200000	BondRef	Relaxation	                	          568.6	           1.0x	          203.7	          364.9	           1.0x	69677036804308
       chembl_22.smi	       200000-300000	AtomContainer	Relaxation	          	        55235.3	          93.6x	            0.0	        55235.3	         142.1x	91700990495618
       chembl_22.smi	       200000-300000	AtomContainer	Relaxation	Improved  	         3659.4	           6.2x	            0.0	         3659.4	           9.4x	91700990495618
       chembl_22.smi	       200000-300000	AtomRef	Relaxation	                	         1507.6	           2.6x	          337.0	         1170.6	           3.0x	91700990495618
       chembl_22.smi	       200000-300000	BondRef	Relaxation	                	          589.8	           1.0x	          201.1	          388.7	           1.0x	91700990495618
       chembl_22.smi	       300000-400000	AtomContainer	Relaxation	          	        41381.5	          72.4x	            0.0	        41381.5	         112.2x	111401301444894
       chembl_22.smi	       300000-400000	AtomContainer	Relaxation	Improved  	         2752.8	           4.8x	            0.0	         2752.8	           7.5x	111401301444894
       chembl_22.smi	       300000-400000	AtomRef	Relaxation	                	         1454.5	           2.5x	          349.5	         1105.0	           3.0x	111401301444894
       chembl_22.smi	       300000-400000	BondRef	Relaxation	                	          571.3	           1.0x	          202.5	          368.9	           1.0x	111401301444894
       chembl_22.smi	       400000-500000	AtomContainer	Relaxation	          	        64022.6	         100.0x	            0.0	        64022.6	         150.6x	138124018920605
       chembl_22.smi	       400000-500000	AtomContainer	Relaxation	Improved  	         4214.6	           6.6x	            0.0	         4214.6	           9.9x	138124018920605
       chembl_22.smi	       400000-500000	AtomRef	Relaxation	                	         1634.1	           2.6x	          357.4	         1276.7	           3.0x	138124018920605
       chembl_22.smi	       400000-500000	BondRef	Relaxation	                	          640.4	           1.0x	          215.2	          425.1	           1.0x	138124018920605

johnmay · 2017-03-15T22:51:12Z

The main thing I wanted to test here was how much slower AtomRef/BondRef was than the int[][]. It turns out to be slightly faster to process with int[][] but slower to initialize (I could speed it up - I don't actually use the EdgeToBondMap but I wanted apples-to-apples on the initalization).

egonw · 2017-03-20T10:07:17Z

Nice!

…on tables.

…ntainerRef that provides improved performance on the existing API.

gilleain · 2017-04-28T07:11:05Z

Looks good. I've also had to use the approach of pre-computing a table of connections and this looks like a much neater way of doing it.

johnmay · 2017-04-29T11:13:59Z

Having thought more about this I think I do some API acrobatics and fit it in portably with the current IAtom/IAtomContainer... more to follow (will probably be a different pull request)

johnmay · 2017-09-13T21:23:53Z

Replaced by #368... ek this was back in March!

egonw added the non-final label Mar 26, 2017

johnmay added 3 commits April 24, 2017 22:47

AtomRef and BondRef implementations providing read optimised connecti…

d5f2fe1

…on tables.

Optimize atom and bond ref.

9925348

Rejig the Atom/Bond Ref implementations and manage them via an AtomCo…

c92ff06

…ntainerRef that provides improved performance on the existing API.

johnmay force-pushed the patch/readoptdatastructures branch from 4683f4e to c92ff06 Compare April 24, 2017 21:48

johnmay mentioned this pull request Jun 20, 2017

Faster AtomContainer Prep Work #339

Merged

johnmay closed this Sep 13, 2017

johnmay deleted the patch/readoptdatastructures branch March 18, 2018 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read Optimised Data Structures #278

Read Optimised Data Structures #278

johnmay commented Mar 14, 2017

egonw commented Mar 15, 2017

rajarshi commented Mar 15, 2017

johnmay commented Mar 15, 2017

rajarshi commented Mar 15, 2017

johnmay commented Mar 15, 2017

johnmay commented Mar 15, 2017 •

edited

rajarshi commented Mar 15, 2017

johnmay commented Mar 15, 2017 •

edited

johnmay commented Mar 15, 2017

egonw commented Mar 20, 2017

gilleain commented Apr 28, 2017

johnmay commented Apr 29, 2017

johnmay commented Sep 13, 2017

Read Optimised Data Structures #278

Read Optimised Data Structures #278

Conversation

johnmay commented Mar 14, 2017

egonw commented Mar 15, 2017

rajarshi commented Mar 15, 2017

johnmay commented Mar 15, 2017

rajarshi commented Mar 15, 2017

johnmay commented Mar 15, 2017

johnmay commented Mar 15, 2017 • edited

rajarshi commented Mar 15, 2017

johnmay commented Mar 15, 2017 • edited

DepthFirstSearch

Relaxation

johnmay commented Mar 15, 2017

egonw commented Mar 20, 2017

gilleain commented Apr 28, 2017

johnmay commented Apr 29, 2017

johnmay commented Sep 13, 2017

johnmay commented Mar 15, 2017 •

edited

johnmay commented Mar 15, 2017 •

edited